Researchers from ETH Zurich Introduce GoT (Graph of Thoughts): A Machi …

Artificial Intelligence (AI) has seen a rise in the use of Large Language Models (LLMs). A particular sort of LLM that is based on the Transformer architecture’s decoder-only design has acquired a lot of popularity recently. Models including GPT, PaLM, and LLaMA have gained massive popularity in recent times. Prompt engineering is a strategic technique that has been a successful and resource-efficient way to use LLMs to tackle diverse issues with the main goal of embedding task-specific instructions for the LLM in the input text. The LLM can use its autoregressive token-based approach to create pertinent text and complete the task if these instructions are properly written.

The Chain-of-Thought (CoT) method expands on prompt engineering. In CoT, the input prompt provides thoughts or intermediate steps of deliberation in addition to the task’s description. The LLM’s ability to solve problems is considerably improved by this addition without the need for model updates. Comparing the capabilities of LLMs to current paradigms like Chain-of-Thought and Tree of Thoughts (ToT), a recent Graph of Thoughts (GoT) framework has been introduced.

GoT represents data as an arbitrary graph, enabling LLMs to generate and handle data in a more flexible way. Individual pieces of information, or LLM thoughts, are shown as vertices in this graph, while the connections and dependencies among them are shown as edges. It allows different LLM ideas to be combined to produce more potent and effective results. By allowing these thoughts to be coupled and interdependent inside the graph, this is accomplished. GoT can record complex networks of thoughts, in contrast to linear paradigms that limit thought. This opens the door to combining various ideas into a cohesive answer, reducing intricate thought networks to their essential components and improving ideas through feedback loops.

GoT’s greater performance in comparison to existing methods across multiple tasks serves as an illustration of its effectiveness. GoT outperforms ToT in a sorting test by increasing sorting quality by 62%. It simultaneously reduces computing expenses by more than 31%. These outcomes demonstrate GoT’s capacity to balance task accuracy with resource efficiency. GoT’s extensibility is one of its most noticeable benefits. The framework is flexible enough to lead creative prompting schemes since it is easily adaptable to fresh idea transformations. This agility is essential for navigating the LLM research and application landscape as it changes.

This work significantly advances the alignment of LLM reasoning with human thinking processes and brain systems by establishing the GoT framework. Thoughts interact, branch out, and influence one another in complex networks in both human and brain thought processes. Thus, GoT improves the skills of LLMs and their capacity to handle challenging problems by bridging the gap between conventional linear techniques and these sophisticated, network-like mental processes.

Check out the Paper and Github. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 29k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

The post Researchers from ETH Zurich Introduce GoT (Graph of Thoughts): A Machine Learning Framework that Advances Prompting Capabilities in Large Language Models (LLMs) appeared first on MarkTechPost.

Meet AutoGPTQ: An Easy-to-Use LLMs Quantization Package with User-Frie …

Researchers from Hugging Face have introduced an innovative solution to address the challenges posed by the resource-intensive demands of training and deploying large language models (LLMs). Their newly integrated AutoGPTQ library in the Transformers ecosystem allows users to quantize and run LLMs using the GPTQ algorithm.

In natural language processing, LLMs have transformed various domains through their ability to understand and generate human-like text. However, the computational requirements for training and deploying these models have posed significant obstacles. To tackle this, the researchers integrated the GPTQ algorithm, a quantization technique, into the AutoGPTQ library. This advancement enables users to execute models in reduced bit precision – 8, 4, 3, or even 2 bits – while maintaining negligible accuracy degradation and comparable inference speed to fp16 baselines, especially for small batch sizes.

GPTQ, categorized as a Post-Training Quantization (PTQ) method, optimizes the trade-off between memory efficiency and computational speed. It adopts a hybrid quantization scheme where model weights are quantized as int4, while activations are retained in float16. Weights are dynamically dequantized during inference, and actual computation is performed in float16. This approach brings memory savings due to fused kernel-based dequantization and potential speedups through reduced data communication time.

The researchers tackled the challenge of layer-wise compression in GPTQ by leveraging the Optimal Brain Quantization (OBQ) framework. They developed optimizations that streamline the quantization algorithm while maintaining model accuracy. Compared to traditional PTQ methods, GPTQ demonstrated impressive improvements in quantization efficiency, reducing the time required for quantizing large models.

Integration with the AutoGPTQ library simplifies the quantization process, allowing users to leverage GPTQ for various transformer architectures easily. With native support in the Transformers library, users can quantize models without complex setups. Notably, quantized models retain their serializability and shareability on platforms like the Hugging Face Hub, opening avenues for broader access and collaboration.

The integration also extends to the Text-Generation-Inference library (TGI), enabling GPTQ models to be deployed efficiently in production environments. Users can harness dynamic batching and other advanced features alongside GPTQ for optimal resource utilization.

While the AutoGPTQ integration presents significant benefits, the researchers acknowledge room for further improvement. They highlight the potential for enhancing kernel implementations and exploring quantization techniques encompassing weights and activations. The integration currently focuses on decoder or encoder-only architectures in LLMs, limiting its applicability to certain models.

In conclusion, integrating the AutoGPTQ library in Transformers by Hugging Face addresses resource-intensive LLM training and deployment challenges. By introducing GPTQ quantization, the researchers offer an efficient solution that optimizes memory consumption and inference speed. The integration’s wide coverage and user-friendly interface signify a step toward democratizing access to quantized LLMs across different GPU architectures. As this field continues to evolve, the collaborative efforts of researchers in the machine-learning community hold promise for further advancements and innovations.

Check out the Paper, Github and Reference Article. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 29k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

LLMs just got faster and lighter with Transformers x AutoGPTQ !You can now load your models from @huggingface with GPTQ quantization. Enjoy faster inference speed and lower memory usage than existing supported quantization schemes Blogpost: https://t.co/vizRr9Ssxa— Marc Sun (@_marcsun) August 23, 2023

The post Meet AutoGPTQ: An Easy-to-Use LLMs Quantization Package with User-Friendly APIs based on GPTQ Algorithm appeared first on MarkTechPost.

MIT Researchers Developed an Artificial Intelligence (AI) Technique th …

Whole-body manipulation is a strength of humans but a weakness of robots. The robot interprets each possible contact point between the box and the carrier’s fingers, arms, or torso as a separate contact event. This task becomes difficult to prepare for as soon as one considers the billions of possible contact events. Now, MIT researchers can streamline this technique, called contact-rich manipulation planning. An artificial intelligence approach called smoothing is used to reduce the number of judgments needed to find a good manipulation plan for the robot from the vast number of contact occurrences.

New developments in RL have demonstrated amazing results in manipulating through contact-rich dynamics, something that was previously challenging to achieve using model-based techniques. While these techniques were effective, it has yet to be known why they succeeded while model-based approaches failed. The overarching objective is to grasp and make sense of these factors from a model-based vantage point. Based on these understandings, scientists work to merge RL’s empirical success with the models’ generalizability and efficacy.

The hybrid nature of contact dynamics presents the greatest challenge to planning through touch from a model-based perspective. Since the ensuing dynamics are non-smooth, the Taylor approximation is no longer valid locally, and the linear model built using the gradient quickly breaks down. Since both iterative gradient-based optimization and sampling-based planning use local distance metrics, the local model’s invalidity poses serious difficulties for both. In response to these problems, numerous publications have attempted to take contact modes into account by either listing them or providing examples of them. These planners, who have a model-based understanding of the dynamic modes, often switch between continuous-state planning in the current contact mode and a discrete search for the next mode, leading to trajectories with a few-mode shifts here and there.

The first thing researchers have added is proof that the two smoothing strategies are theoretically equivalent for basic systems under the framework. In addition, using this framework, the authors demonstrate how to efficiently compute the locally linear models (i.e., gradients) of the smoothed dynamics in real-time, and they demonstrate that the qualitative characteristics and empirical performance of the two smoothing schemes are comparable across various complex examples.

What is the best model for gradient-based contact-rich manipulation planning? It is at the center of the model-based planning approach taken by researchers. To efficiently compute local linearizations for planning, they believe this model must be (i) numerically robust and (ii) differentiable. For the planner to see far into the future with minimal effort, the model must be able to (iii) forecast long-term behavior. Finally, the model needs to (iv) be smooth to provide more information gradients across contact modes.

The second improvement is a complete model of contact dynamics. In particular, they suggest an implicit time-stepping contact model that is convex. Anitescu’s relaxation of frictional contact limits leads to convexity. However, it does introduce some mildly non-physical behavior in reality. Compared to the standard Linear Complementarity Problem (LCP) formulation, convexity offers significant numerical benefits.

The quasi-dynamic assumption is commonly employed in robotic manipulation because it allows long-term predictability. There is no need for variables representing velocity or damping in quasi-dynamic models because kinetic energy is lost at each time step. They verify and test the quasi-dynamic contact model by simulating and executing the same input paths in Drake, a high-fidelity second-order simulator on hardware. If the system under consideration is highly damped and dominated by frictional forces, the results suggest that the model can better approximate the second-order dynamics.

In addition, a log-barrier relaxation can be used to soften the contact model analytically. As is typical in the interior point method for convex systems, a log-barrier function is utilized to enforce the hard contact restrictions in this relaxation strategy flexibly. Further work demonstrates that the implicit function theorem provides a straightforward method for calculating the gradients of the smoothed contact model. Finally, experts believe RL’s aim of performing global optimization with stochasticity is another major element behind its empirical success. Nonlinear dynamical planning using deterministic models typically yields non-convex optimization problems, where the quality of many local minima might be decisive. 

The last contribution addresses this shortcoming by integrating RRT’s global search capabilities with those of smoothing-based contact mode abstraction. Using a novel distance measure derived from the local smoothed models, researchers have made it possible for RRT to search through the limits imposed by contact dynamics.

Overall Contributions

Scientists determine the qualitative and empirical equivalence of randomized and analytic smoothing techniques on straightforward systems.

They show contact-rich manipulation planning can benefit greatly from a convex, differentiable formulation of quasi-dynamic contact dynamics and associated analytic smoothing. 

Researchers integrate contact mode smoothing with sampling-based motion planning to achieve effective global planning via highly rich contact dynamics, filling a gap in the spectrum of existing approaches.

Researchers clarify the mathematical meaning of smoothing a function and several strategies for computing its local approximations before discussing contact in complicated systems. Their goal is to present a unified picture of smoothing techniques and the relationships between them.

The researchers were inspired to do this by the striking difference between the success of RL in empirical situations with lots of human contact and the failure of model-based approaches. They have shown that traditional model-based approaches can effectively tackle planning for contact-rich manipulation by identifying the pitfalls in the existing model-based methods for planning, understanding how RL was able to alleviate such traps, and resolving them with model-based techniques. By enabling efficient online planning in the order of a minute and being generalizable with respect to environments and tasks, the contribution offers a powerful alternative to existing tools in RL that rely on heavy offline computation on the order of hours or days. They review some of the factors that made this possible.

In a nutshell, they were inspired to do this study after realizing the dramatic gap between the success of RL in empirical contexts and the struggle of model-based approaches to this problem. They have shown that traditional model-based approaches can effectively tackle planning for contact-rich manipulation by identifying the pitfalls in the existing model-based methods for planning, understanding how RL was able to alleviate such pitfalls, and resolving them with model-based techniques. By enabling efficient online planning in the order of a minute and being generalizable with respect to environments and tasks, the contribution offers a powerful alternative to existing tools in RL that rely on heavy offline computation on the order of hours or days. They review some of the factors that made this possible.

Initially identified as a flaw in model-based approaches, the need to explicitly enumerate and assess modes has been mitigated by RL’s stochastic smoothing. Next, they’ve brought up another flaw in model-based techniques: second-order transients might cause short-sighted linearizations that don’t help with long-term strategy. They have proposed the Convex Differentiable Quasi-Dynamic Contact (CQDC) model to address this shortcoming. They have demonstrated the usefulness of the touch model through numerous theoretical arguments and experiments. They also demonstrated that the contact dynamics can be relaxed analytically with a log barrier by first evaluating the model’s structure. They conducted studies demonstrating the computing advantages of analytic smoothing over randomized smoothing.

In conclusion, they found that smoothing-based model-based strategies have been linked to local trajectory optimization. Compared to RL-based techniques that try to do global search, they have proven less successful in challenging issues due to their susceptibility to local minima. However, SBMP techniques for contact-rich systems have avoided the trap of mode enumeration by explicitly taking into account contact modes. The work contributes by closing a gap in pre-existing approaches by fusing mode smoothing with RRT, wherein the exploration phase of RRT was guided by a local approximation to the smooth surrogate based on the local Mahalanobis metric. By combining these three advancements they have made it possible for model-based and RL-based approaches to achieve efficient global motion planning for very contact-rich and high-dimensional systems. In the future, they will employ a highly streamlined planner version to drive policy searches or perform real-time motion planning. They anticipate that this enhancement will allow robots to locate contact-rich designs online in previously unexplored areas within seconds of planning time.

Check out the Paper and Reference Article. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 29k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

The post MIT Researchers Developed an Artificial Intelligence (AI) Technique that Enables a Robot to Develop Complex Plans for Manipulating an Object Using its Entire Hand appeared first on MarkTechPost.

Role of Data Contracts in Data Pipeline

What are Data Contracts?

A data contract is an agreement or set of rules defining how data should be structured and processed within a system. It serves as a crucial communication tool between different parts of an organization or between various software components. It refers to the management and intended data usage between different organizations or sometimes within a single company.

The primary purpose of a data contract is to ensure that data remains consistent and compatible across different versions or components of a system. A data contract includes the following – 

Terms of Services: Description of how the data can be used, whether for development, testing, or deployment.

Service Level Agreements (SLA): SLAs describe the quality of data delivery and might include uptime, error rates, availability, etc. 

Similar to how business contracts outline responsibilities between suppliers and consumers of a product, data contracts establish and ensure the quality, usability, and dependability of data products.

What Metadata should be included in a Data Contract?

Schema: Schema provides useful information on data processing and analysis. It is a set of rules and constraints placed on the columns of a dataset. Data Sources evolve, and producers must ensure detecting and reacting to schema changes is possible. Consumers should be able to process data with the old Schema.

Semantics:  Semantics capture the rules of each business domain. They include aspects like how businesses transition to and from different stages within their lifecycle, how they relate to one another, etc. Just like Schema, Semantics can also evolve over a period of time.

Service Level Agreements (SLAs): SLAs specify the availability and freshness of data in a data product. They help data practitioners design data consumption pipelines effectively. SLAs include commitments like maximum expected delay, when is the new data expected in the data product and metrics like mean time between failures and mean time to recovery.

What is the significance of Data Contracts?

The primary benefit of a data contract is its role in ensuring compatibility and consistency between various versions of data schemas. Specifically, data contracts offer several advantages:

Compatibility Assurance: When a data contract is established to define data structure and rules, it guarantees that data produced and consumed by different components or system versions remain compatible. This proactive approach minimizes data processing complications during schema evolution.

Consistency Enforcement: Data contracts act as enforcers of consistency in data representation. They compel all producers and consumers to adhere to the same Schema, promoting data correctness and enhancing system reliability.

Version Control: Data contracts can undergo versioning and tracking over time. This capability enables structured management of changes to data schemas, which is invaluable for navigating schema evolution seamlessly.

Effective Communication: Data contracts are an effective communication tool among diverse organizational teams or components. They establish a shared understanding of data structures and formats, fostering collaboration.

Error Prevention: A well-defined data contract prevents error, particularly in schema mismatches or unexpected alterations. It facilitates early detection of schema-related issues.

Practical Ways to Enforce Data Contracts

In this data processing pipeline, schema changes are managed within a Git repository and applied to data-producing applications, ensuring consistent data structures. The applications send their data to Kafka Topics, separating raw data from Change Data Capture (CDC) streams. A Flink App validates the data against Schema Registry schemas from the raw data streams. Any inaccurate data is directed to the Dead Letter Topic, while valid data is sent to the validated Data Topic. Real-time applications can directly access data from these validated topics.

Furthermore, data from the validated Data Topic is stored for additional checks, including validation against specific Service Level Agreements (SLAs). Subsequently, this data is sent to the Data Warehouse for in-depth analysis. Should any SLAs be breached, consumers and producers receive alerts. Lastly, invalidated Flink Apps review real-time data for potential fixes with a recovery Flink App. This comprehensive pipeline ensures data consistency, validation, and reliability throughout the process, facilitating efficient data analysis and monitoring.

References

https://towardsdatascience.com/data-contracts-ensure-robustness-in-your-data-mesh-architecture-69a3c38f07db

https://www.montecarlodata.com/blog-data-contracts-explained/

https://atlan.com/data-contracts/#what-is-inside-a-data-contract

Also, don’t forget to join our 29k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

 Hostinger AI Website Builder: User-Friendly Drag-and-Drop Editor. Try Now (Sponsored)

The post Role of Data Contracts in Data Pipeline appeared first on MarkTechPost.

This AI Paper Proposes MATLABER: A Novel Latent BRDF Auto-Encoder for …

The development of 3D assets is essential for many commercial applications, including gaming, cinema, and AR/VR. Several labor-intensive and time-consuming steps are required in the traditional 3D asset development process, all of which depend on specialized knowledge and formal aesthetic training. Recent advances in generation quality and efficiency, as well as their potential to significantly reduce the time and skill requirements of traditional 3D asset creation, have drawn increasing attention to text-to-3D pipelines that automatically generate 3D assets from purely textual descriptions. 

These text-to-3D pipelines can provide engaging geometry and appearance by gradually optimizing the target 3D asset expressed as NeRF or DMTET through the SDS loss. Figure 1 illustrates how difficult it is for them to restore high-fidelity object materials, which severely restricts their use in real-world applications like relighting. Although attempts have been made to model bidirectional reflectance distribution function (BRDF) and Lambertian reflectance in their designs, the neural network in charge of predicting materials lacks the motivation and cues necessary to identify an appropriate material that complies with the natural distribution, particularly in fixed light conditions where their indicated material is frequently entangled with environment lights. 

In this study, researchers from Shanghai AI Laboratory and S – Lab, Nanyang Technological University, use rich material data that is already available to learn a unique text-to-3D pipeline that successfully separates material from ambient lighting. There are large-scale BRDF material datasets such as MERL BRDF, Adobe Substance3D materials, and the actual-world BRDF collections TwoShotBRDF, notwithstanding the inaccessibility of coupled datasets of material and text descriptions. As a result, they suggest Material-Aware Text-to-3D through LAtent BRDF auto EncodeR (MATLABER), which uses a brand-new latent BRDF auto-encoder to create realistic and natural-looking materials that precisely match the text prompts. 

For MATLABER to predict BRDF latent codes rather than BRDF values, the latent BRDF auto-encoder is trained to incorporate real-world BRDF priors of TwoShotBRDF in its smooth latent space. This allows MATLABER to concentrate more on selecting the most appropriate material and worry less about the validity of the projected BRDF. Their method guarantees the realism and coherence of object materials and achieves the optimal decoupling of geometry and appearance thanks to the smooth latent space of the BRDF auto-encoder. Their method can produce 3D assets with high-fidelity content, exceeding earlier state-of-the-art text-to-3D pipelines, as illustrated in Figure 1.

Figure 1: The goal of text-to-3D generation is to create high-quality 3D objects that correspond to provided text descriptions. Despite the striking visuals, representative techniques like DreamFusion and Fantasia3D continue to fall short in recovering high-fidelity object materials. Specifically, Fantasia3D forecasts BRDF materials entangled with ambient lighting while DreamFusion just takes into account diffuse materials. The method, which is based on a latent BRDF auto-encoder, can produce organic materials for 3D objects, enabling realistic renderings under various lighting conditions.

More crucially, an accurate estimate of object materials enables activities like scene modification, material editing, and relighting that were previously difficult to do. Several real-world applications notice that these downstream duties are essential, opening the door for a more practical paradigm of 3D content generation. Additionally, their algorithm can infer tactile and sonic information from the acquired materials, which together make up the trinity of material for virtual things, by using multi-modal datasets like ObjectFolder.

Check out the Paper and Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 29k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, please follow us on Twitter

The post This AI Paper Proposes MATLABER: A Novel Latent BRDF Auto-Encoder for Material-Aware Text-to-3D Generation appeared first on MarkTechPost.

Apple Researchers Propose an End-to-End Network Producing Detailed 3D …

Have you ever played GTA-5? One gets admired for the 3D graphics in the game. Unlike 2D graphics on a flat plane, 3D graphics simulate depth and perspective, allowing for more realistic and immersive visuals. These graphics are widely used in various fields, including video games, film production, architectural visualization, medical imaging, virtual reality, and more.

The traditional method to create a 3D model was by estimating the depth maps for the input images, which were later fused to create a 3D model. A team of researchers from Apple and the University of California, Santa Barbara created a  direct inference of scene-level 3D geometry using deep neural networks, which didn’t involve the traditional method of test-time optimization. 

The traditional method resulted in missing geometry or artifacts in the areas where the depth maps didn’t match due to being transparent or low-textured surfaces. The researcher’s approach features the images onto a voxel grid and directly predicts the scene’s truncated signed distance function (TSDF) using a 3D convolution neural network.

 A Convolutional Neural Network (CNN) is a specialized artificial neural network designed for processing and analyzing visual data, particularly images and videos. The advantage of using this technique is that CNN can learn and produce smooth, consistent surfaces that can fill the gaps in the low-textured or transparent regions. 

Researchers used tri-linear interpolation to sample the ground-truth TSDF to align with the model’s voxel grid during the training. This tri-linear interpolation sampling added random noise to the details in the training session. To overcome this, they considered only the supervised predictions at the exact points where the ground-truth TSDF is well known, and this method improved the results by 10%.

 A voxel is a short form for volume pixels. It represents a point in 3D space within a grid, similar to how a pixel represents a point in a 2D image. The existing voxels are 4cm or larger, which is not enough to resolve the geometric details visible in natural images, and it is expensive to increase the voxel resolution. They fixed this issue using a CNN grid feature, directly projecting image features to the query point. 

They were required to use a dense back projection for sampling any feature from each input image from each voxel. However, it caused blurring in the back-projection volume, and they solved this by using initial multi-view stereo depth estimation, which was further used to enhance the feature volume. 

Researchers claim that their method is key to enabling the network to learn the fine details and allowing the free selection of output resolution without requiring additional training or 3D convolution levels. 

Check out the Paper and Github link. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 29k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, please follow us on Twitter

The post Apple Researchers Propose an End-to-End Network Producing Detailed 3D Reconstructions from Posed Images appeared first on MarkTechPost.

The Enigma for ChatGPT: PUMA is an AI Approach That Proposes a Fast an …

Large Language Models (LLMs) have started a revolution in the artificial intelligence domain. The release of ChatGPT has sparked the ignition for the era of LLMs, and since then, we have seen them ever improving. These models are made possible with massive amounts of data and have impressed us with their capabilities, from mastering language understanding to simplifying complex tasks.

There have been numerous alternatives proposed to ChatGPT, and they got better and better every day, even managing to surpass ChatGPT in certain tasks. LLaMa, Claudia, Falcon, and more; the new LLM models are coming for the ChatGPT’s throne.

However, there is no doubt that ChatGPT is still by far the most popular LLM out there. There is a really high chance that your favorite AI-powered app is probably just a ChatGPT wrapper, handling the connection for you. But, if we step back and think about the security perspective, is it really private and secure? OpenAI ensures protecting API data privacy is something they deeply care about, but they are facing numerous lawsuits at the same time. Even if they work really hard to protect the privacy and security of the model usage, these models can be too powerful to be controlled.

So how do we ensure we can utilize the power of LLMs without concerns about privacy and security arising? How do we utilize these models’ prowess without compromising sensitive data? Let us meet with PUMA.

PUMA is a framework designed to enable secure and efficient evaluation of Transformer models, all while maintaining the sanctity of your data. It merges secure multi-party computation (MPC) with efficient Transformer inference.

At its core, PUMA introduces a novel technique to approximate the complex non-linear functions within Transformer models, like GeLU and Softmax. These approximations are tailored to retain accuracy while significantly boosting efficiency. Unlike previous methods that might sacrifice performance or lead to convoluted deployment strategies, PUMA’s approach balances both worlds – ensuring accurate results while maintaining the efficiency necessary for real-world applications.

PUMA introduces three pivotal entities: the model owner, the client, and the computing parties. Each entity plays a crucial role in the secure inference process. 

The model owner supplies the trained Transformer models, while the client contributes the input data and receives the inference results. The computing parties collectively execute secure computation protocols, ensuring that data and model weights remain securely protected throughout the process. The underpinning principle of PUMA‘s inference process is to maintain the confidentiality of input data and weights, preserving the privacy of the entities involved.

Secure embedding, a fundamental aspect of the secure inference process, traditionally involves the generation of a one-hot vector using token identifiers. Instead, PUMA proposes a secure embedding design that adheres closely to the standard workflow of Transformer models. This streamlined approach ensures that the security measures do not interfere with the inherent architecture of the model, simplifying the deployment of secure models in practical applications.

Overview of secure GeLU and LayerNorm protocols used in PUMA. Source: https://arxiv.org/pdf/2307.12533.pdf

Moreover, a major challenge in secure inference lies in approximating complex functions, such as GeLU and Softmax, in a way that balances computational efficiency with accuracy. PUMA tackles this aspect by devising more accurate approximations tailored to the properties of these functions. By leveraging the specific characteristics of these functions, PUMA significantly enhances the precision of the approximation while optimizing runtime and communication costs.

Finally, LayerNorm, a crucial operation within the Transformer model, presents unique challenges in secure inference due to the divide-square-root formula. PUMA addresses this by smartly redefining the operation using secure protocols, thus ensuring that the computation of LayerNorm remains both secure and efficient. 

One of the most important features of PUMA is its seamless integration. The framework facilitates end-to-end secure inference for Transformer models without necessitating major model architecture modifications. This means you can leverage pre-trained Transformer models with minimal effort. Whether it’s a language model downloaded from Hugging Face or another source, PUMA keeps things simple. It aligns with the original workflow and doesn’t demand complex retraining or modifications.

Check out the Paper and Github link. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 29k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, please follow us on Twitter

The post The Enigma for ChatGPT: PUMA is an AI Approach That Proposes a Fast and Secure Way for LLM Inference appeared first on MarkTechPost.

15 Artificial Intelligence (AI) And Machine Learning-Related Subreddit …

In the fast-paced world of Artificial Intelligence (AI) and Machine Learning, staying updated with the latest trends, breakthroughs, and discussions is crucial. Reddit, the front page of the internet, serves as a hub for experts and enthusiasts alike. Here’s our curated list of the top AI and Machine Learning-related subreddits to follow in 2023 to keep you in the loop.

r/MachineLearning

The sub is focused on machine learning and has regular technical and intriguing posts and discussions. There are a few basic behavioral rules for the subreddit. With over 2.5 million members, this is a must-join group for ML enthusiasts.

r/artificial

r/artificial is the largest subreddit dedicated to all issues related to Artificial Intelligence or AI. With over 167k members, one can find the latest news, examples of AI in practice, and discussions and questions from those working on or studying it. AI is a vast field that touches many disciplines and has many subfields. Many of these have subreddits dedicated to them as well. r/artificial is about all of these things. It is a platform for anyone interested in intelligent and respectful discussion of AI in any form. 

r/ArtificialInteligence

r/ArtificialInteligence is one of the most trending AI subreddits where you don’t have to select a content flair. This subreddit has over 88k+ members. You can join this subreddit to be updated about trending AI updates.

r/Machinelearningnews

r/machinelearningnews is a community of machine learning enthusiasts/researchers/journalists/writers who share interesting news and articles about the applications of AI. You will never miss any updates on ML/AI/CV/NLP fields because it is posted on a daily basis and highly moderated to avoid any spam.

r/Automate

The sub has more than 75k members participating in discussions and posts focused on automation. Discussions about automation, additive manufacturing, robots, AI, and all the other technologies we’ve developed to enable a world without menial work can be found on the r/Automate subreddit.

r/singularity

The subreddit is dedicated to a thoughtful study of the hypothetical period when artificial intelligence develops to the degree of superior intelligence to that of humans, fundamentally altering civilization. With over 161k members, the sub has posts of excellent quality and relevance. It encompasses all aspects of the technological singularity and associated subjects, such as artificial intelligence (AI), human augmentation, etc. 

r/agi

With around 12.5k members, the sub is focused on Artificial General Intelligence. A machine with artificial general intelligence (AGI) is one that is capable of carrying out any intellectual work that a human can do. The posts are regular and informative, with creative discussions. 

r/compsci

Anyone interested in sharing and discussing information that computer scientists find fascinating should visit the r/compsci subreddit. This contains a lot of posts about AI. It has a few simple rules to abide by as members. The subreddit has over 2.1 million members.

r/AIethics

Ethics are fundamental in AI. r/AIethics has the latest content on how one can use and create various AI tools ethically. The rules are simple. It has over 3.2k members. The subreddit features discussions of how artificial intelligence agents should behave and how we should treat them.

r/cogsci

Although cognitive science is a large field, the subreddit features postings that in some way relate to the study of the mind from a scientific perspective, also featuring the latest AI. It features the interdisciplinary study of the mind and intelligence, embracing philosophy, psychology, artificial intelligence, neuroscience, linguistics, and anthropology. There are a few broad behavioral guidelines for users to abide by, and it has more than 107k members.

r/computervision

Computer vision is the branch of AI science that focuses on creating algorithms to extract useful information from raw photos, videos, and sensor data. The subreddit has excellent computer vision and artificial intelligence content. There are about 68k members. With expertise in computer science, machine learning, robotics, mathematics, and other fields, this community is home to academics and engineers developing and utilizing this interdisciplinary topic.

r/datascience

It features the latest content and discussions on Data Science and its related fields. The subreddit has over 817k members and active moderators. It serves as a forum for discussion and debate on matters related to the career of data scientists.

r/learnmachinelearning

The subreddit is dedicated to learning the latest machine-learning algorithms. It has a broad set of rules and features regular content related to the latest advancements in machine learning. The subreddit has over 276k members.

r/MLQuestions

/r/Machine learning is an excellent subreddit for interesting articles and news related to machine learning. It has over 37.4k members and has active discussions on various ML topics. It is a place for beginners to ask stupid questions and for experts to help them! 

r/neuralnetworks

The Subreddit is about Deep Learning, Artificial Neural Networks, and Machine Learning. It features various posts and regular discussions on these topics. It has over 21.8k members and is a great place to learn more about the latest AI.

The post 15 Artificial Intelligence (AI) And Machine Learning-Related Subreddit Communities in 2023 appeared first on MarkTechPost.

Google Introduces MediaPipe for Raspberry Pi with an Easy-to-Use Pytho …

In response to the exponentially growing demand for accessible machine learning (ML) tools on embedded systems, researchers have introduced an innovative solution designed to empower developers working with Raspberry Pi single-board computers. The new framework, MediaPipe for Raspberry Pi, offers a Python-based software development kit (SDK) tailored to facilitate various ML tasks. This development is a significant advancement in the realm of on-device ML, addressing the need for simplified and efficient tools.

The emergence of on-device machine learning has presented developers with unique resource limitations and complexity challenges. The Raspberry Pi, a popular platform for hobbyists and professionals alike, lacked a comprehensive SDK enabling users to utilize the power of machine learning in their projects seamlessly. This scarcity of accessible tools prompted the need for a user-friendly solution.

Before the introduction of MediaPipe for Raspberry Pi, developers often grappled with adapting generic machine learning frameworks to suit the capabilities of Raspberry Pi devices. This process was often convoluted and demanded a deep understanding of ML algorithms and hardware constraints. This Challenge was exacerbated by the need for an SDK explicitly tailored to the Raspberry Pi ecosystem.

Researchers from various institutions have stepped forward to unveil a groundbreaking framework that addresses these issues. The MediaPipe for Raspberry Pi SDK results from collaborative efforts to streamline on-device ML development. The framework offers a Python-based interface that facilitates a range of machine-learning tasks, including audio classification, text classification, gesture recognition, and more. Its introduction signifies a significant leap forward in empowering developers of all backgrounds to seamlessly integrate machine learning into their Raspberry Pi projects.

MediaPipe for Raspberry Pi simplifies the development process by providing pre-built components that handle the intricacies of machine learning implementation on embedded systems. The SDK’s integration with OpenCV and NumPy further enhances its utility. The framework enables users to kickstart their projects by utilizing provided Python examples that cover various applications such as audio classification, facial landmarking, image classification, and more. Additionally, developers are encouraged to employ locally stored ML models to ensure optimal performance on their Raspberry Pi devices.

While the MediaPipe for Raspberry Pi framework promises to enhance the ML development experience, it’s important to note that its performance varies across different Raspberry Pi models. Peak performance can be achieved on the Raspberry Pi 4 and Raspberry Pi 400 models due to their improved hardware capabilities. As the community embraces this framework, performance metrics across various use cases and device models will likely surface, contributing to a better understanding of its real-world impact.

The introduction of MediaPipe for Raspberry Pi underscores the commitment to democratizing machine learning by making it accessible to a broader audience. This user-friendly SDK not only addresses the existing challenges faced by developers in the realm of on-device ML but also paves the way for innovative projects that can harness the potential of embedded systems. As the framework gains traction, it is anticipated that developers will contribute to its growth by sharing their experiences, fine-tuning its performance, and expanding its capabilities. MediaPipe for Raspberry Pi marks a pivotal step in evolving on-device machine learning and offers a glimpse into the future of embedded system development.

Check out the Reference Article. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 29k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, please follow us on Twitter

The post Google Introduces MediaPipe for Raspberry Pi with an Easy-to-Use Python SDK for On-Device Machine Learning appeared first on MarkTechPost.

How can Businesses Improve the Accuracy of Multilingual Product Classi …

By capitalizing on shared representations common to different languages, cross-lingual learning is known to enhance the accuracy of NLP models on Low-Resource Languages (LRLs) that have limited data for model training. However, there is a significant disparity in the accuracy of high-resource languages (HRLs) and low-resource languages (LRLs), and this connects to the relative scarcity of pre-training data from the LRLs, even for state-of-the-art (SOTA) models. Targets for language-level accuracy are frequently imposed in professional contexts. This is when techniques like neural machine translation, transliteration, and label propagation on similar data are useful since they may be used to enhance the existing training data synthetically.

These methods can be used to augment the quantity and quality of training data without resorting to prohibitively expensive manual annotation. As a result of the limitations of machine translation, it may need to catch up to the commercial goals even though translation usually improves LRL accuracy.

A team of researchers from Amazon offers an approach to improving low-resource language (LRL) accuracy by employing active learning to collect labeled data selectively. Active learning for multilingual data has previously been studied, although most focus has been on training a model for a single language. To that end, they are working to perfect a single model that can effectively translate between languages. The method, Language Aware Active Learning for Multilingual Models (LAMM), is analogous to the work, which showed that active learning can improve model performance across languages while utilizing a single model. This approach does not, unfortunately, offer a means of specifically targeting and enhancing an LRL’s accuracy. As a result of their insistence on getting labels for languages that have already exceeded their accuracy objectives, today’s state-of-the-art active learning algorithms waste manual annotations in situations where meeting language-level targets is essential. To improve LRL accuracy without negatively impacting HRL performance, they present an active-learning-based strategy for collecting labeled data strategically. The suggested strategy, LAMM, enhances the likelihood of achieving accuracy targets across all relevant languages.

Researchers frame LAMM as an MOP with multiple goals to achieve. The objective is to pick examples of unlabeled data that are:

Indeterminate (the model has little faith in its results)

From language families, the classifier’s performance could be better than the goals.  

Amazon researchers compare LAMM’s performance to two benchmarks on four multilingual classification datasets using the typical pool-based active learning setup. Two examples of public datasets are Amazon Reviews and MLDoc. Two multilingual product classification datasets are used internally by Amazon. These are the standard procedures:

Least Confidence (LC) gathers the most entropically uncertain samples possible.

Equal Allocation (EC), to fill the per-language annotation budget, samples with high entropy are gathered, and the annotation budget is divided equally across the languages.

They found that LAMM outperforms the competition on all LRLs while only slightly underperforming on HRL. The percentage of HRL labels is reduced by 62.1% when using LAMM, although the accuracy of AUC is reduced by just 1.2% when comparing LAMM to LC. Using four different datasets for product classification, two publicly available and two proprietary, they show that LAMM can increase LRL performance by 4–11% relative to robust baselines.

Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 29k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, please follow us on Twitter

The post How can Businesses Improve the Accuracy of Multilingual Product Classifiers? This AI Paper Proposes LAMM: An Active Learning Approach Aimed at Bolstering the Classification Accuracy in Languages with Limited Training Data appeared first on MarkTechPost.

Announcing the Preview of Amazon SageMaker Profiler: Track and visuali …

Today, we’re pleased to announce the preview of Amazon SageMaker Profiler, a capability of Amazon SageMaker that provides a detailed view into the AWS compute resources provisioned during training deep learning models on SageMaker. With SageMaker Profiler, you can track all activities on CPUs and GPUs, such as CPU and GPU utilizations, kernel runs on GPUs, kernel launches on CPUs, sync operations, memory operations across GPUs, latencies between kernel launches and corresponding runs, and data transfer between CPUs and GPUs. In this post, we walk you through the capabilities of SageMaker Profiler.
SageMaker Profiler provides Python modules for annotating PyTorch or TensorFlow training scripts and activating SageMaker Profiler. It also offers a user interface (UI) that visualizes the profile, a statistical summary of profiled events, and the timeline of a training job for tracking and understanding the time relationship of the events between GPUs and CPUs.
The need for profiling training jobs
With the rise of deep learning (DL), machine learning (ML) has become compute and data intensive, typically requiring multi-node, multi-GPU clusters. As state-of-the-art models grow in size in the order of trillions of parameters, their computational complexity and cost also increase rapidly. ML practitioners have to cope with common challenges of efficient resource utilization when training such large models. This is particularly evident in large language models (LLMs), which typically have billions of parameters and therefore require large multi-node GPU clusters in order to train them efficiently.
When training these models on large compute clusters, we can encounter compute resource optimization challenges such as I/O bottlenecks, kernel launch latencies, memory limits, and low resource utilizations. If the training job configuration is not optimized, these challenges can result in inefficient hardware utilization and longer training times or incomplete training runs, which increase the overall costs and timelines for the project.
Prerequisites
The following are the prerequisites to start using SageMaker Profiler:

A SageMaker domain in your AWS account – For instructions on setting up a domain, see Onboard to Amazon SageMaker Domain using quick setup. You also need to add domain user profiles for individual users to access the SageMaker Profiler UI application. For more information, see Add and remove SageMaker Domain user profiles.
Permissions – The following list is the minimum set of permissions that should be assigned to the execution role for using the SageMaker Profiler UI application:

sagemaker:CreateApp
sagemaker:DeleteApp
sagemaker:DescribeTrainingJob
sagemaker:SearchTrainingJobs
s3:GetObject
s3:ListBucket

Prepare and run a training job with SageMaker Profiler
To start capturing kernel runs on GPUs while the training job is running, modify your training script using the SageMaker Profiler Python modules. Import the library and add the start_profiling() and stop_profiling() methods to define the beginning and the end of profiling. You can also use optional custom annotations to add markers in the training script to visualize hardware activities during particular operations in each step.
There are two approaches that you can take to profile your training scripts with SageMaker Profiler. The first approach is based on profiling full functions; the second approach is based on profiling specific code lines in functions.
To profile by functions, use the context manager smppy.annotate to annotate full functions. The following example script shows how to implement the context manager to wrap the training loop and full functions in each iteration:

import smppy

sm_prof = smppy.SMProfiler.instance()
config = smppy.Config()
config.profiler = {
“EnableCuda”: “1”,
}
sm_prof.configure(config)
sm_prof.start_profiling()

for epoch in range(args.epochs):
if world_size > 1:
sampler.set_epoch(epoch)
tstart = time.perf_counter()
for i, data in enumerate(trainloader, 0):
with smppy.annotate(“step_”+str(i)):
inputs, labels = data
inputs = inputs.to(“cuda”, non_blocking=True)
labels = labels.to(“cuda”, non_blocking=True)

optimizer.zero_grad()

with smppy.annotate(“Forward”):
outputs = net(inputs)
with smppy.annotate(“Loss”):
loss = criterion(outputs, labels)
with smppy.annotate(“Backward”):
loss.backward()
with smppy.annotate(“Optimizer”):
optimizer.step()

sm_prof.stop_profiling()

You can also use smppy.annotation_begin() and smppy.annotation_end() to annotate specific lines of code in functions. For more information, refer to documentation.
Configure the SageMaker training job launcher
After you’re done annotating and setting up the profiler initiation modules, save the training script and prepare the SageMaker framework estimator for training using the SageMaker Python SDK.

Set up a profiler_config object using the ProfilerConfig and Profiler modules as follows:

from sagemaker import ProfilerConfig, Profiler
profiler_config = ProfilerConfig(
profiler_params = Profiler(cpu_profiling_duration=3600))

Create a SageMaker estimator with the profiler_config object created in the previous step. The following code shows an example of creating a PyTorch estimator:

import sagemaker
from sagemaker.pytorch import PyTorch

estimator = PyTorch(
framework_version=”2.0.0″,
image_uri=”763104351884.dkr.ecr.<region>.amazonaws.com/pytorch-training:2.0.0-gpu-py310-cu118-ubuntu20.04-sagemaker”,
role=sagemaker.get_execution_role(),
entry_point=”train_with_profiler_demo.py”, # your training job entry point
source_dir=source_dir, # source dir for your training script
output_path=output_path,
base_job_name=”sagemaker-profiler-demo”,
hyperparameters=hyperparameters, # if any
instance_count=1,
instance_type=ml.p4d.24xlarge,
profiler_config=profiler_config
)

If you want to create a TensorFlow estimator, import sagemaker.tensorflow.TensorFlow instead, and specify one of the TensorFlow versions supported by SageMaker Profiler. For more information about supported frameworks and instance types, see Supported frameworks.

Start the training job by running the fit method:

estimator.fit(wait=False)

Launch the SageMaker Profiler UI
When the training job is complete, you can launch the SageMaker Profiler UI to visualize and explore the profile of the training job. You can access the SageMaker Profiler UI application through the SageMaker Profiler landing page on the SageMaker console or through the SageMaker domain.
To launch the SageMaker Profiler UI application on the SageMaker console, complete the following steps:

On the SageMaker console, choose Profiler in the navigation pane.
Under Get started, select the domain in which you want to launch the SageMaker Profiler UI application.

If your user profile only belongs to one domain, you will not see the option for selecting a domain.

Select the user profile for which you want to launch the SageMaker Profiler UI application.

If there is no user profile in the domain, choose Create user profile. For more information about creating a new user profile, see Add and Remove User Profiles.

Choose Open Profiler.

You can also launch the SageMaker Profiler UI from the domain details page.
Gain insights from the SageMaker Profiler
When you open the SageMaker Profiler UI, the Select and load a profile page opens, as shown in the following screenshot.

You can view a list of all the training jobs that have been submitted to SageMaker Profiler and search for a particular training job by its name, creation time, and run status (In Progress, Completed, Failed, Stopped, or Stopping). To load a profile, select the training job you want to view and choose Load. The job name should appear in the Loaded profile section at the top.
Choose the job name to generate the dashboard and timeline. Note that when you choose the job, the UI automatically opens the dashboard. You can load and visualize one profile at a time. To load another profile, you must first unload the previously loaded profile. To unload a profile, choose the trash bin icon in the Loaded profile section.
For this post, we view the profile of an ALBEF training job on two ml.p4d.24xlarge instances.
After you finish loading and selecting the training job, the UI opens the Dashboard page, as shown in the following screenshot.

You can see the plots for key metrics, namely the GPU active time, GPU utilization over time, CPU active time, and CPU utilization over time. The GPU active time pie chart shows the percentage of GPU active time vs. GPU idle time, which enables us to check if the GPUs are more active than idle throughout the entire training job. The GPU utilization over time timeline graph shows the average GPU utilization rate over time per node, aggregating all the nodes in a single chart. You can check if the GPUs have an unbalanced workload, under-utilization issues, bottlenecks, or idle issues during certain time intervals. For more details on interpreting these metrics, refer to documentation.
The dashboard provides you with additional plots, including time spent by all GPU kernels, time spent by the top 15 GPU kernels, launch counts of all GPU kernels, and launch counts of the top 15 GPU kernels, as shown in the following screenshot.

Lastly, the dashboard enables you to visualize additional metrics, such as the step time distribution, which is a histogram that shows the distribution of step durations on GPUs, and the kernel precision distribution pie chart, which shows the percentage of time spent on running kernels in different data types such as FP32, FP16, INT32, and INT8.
You can also obtain a pie chart on the GPU activity distribution that shows the percentage of time spent on GPU activities, such as running kernels, memory (memcpy and memset), and synchronization (sync). You can visualize the percentage of time spent on GPU memory operations from the GPU memory operations distribution pie chart.

You can also create your own histograms based on a custom metric that you annotated manually as described earlier in this post. When adding a custom annotation to a new histogram, select or enter the name of the annotation you added in the training script.
Timeline interface
The SageMaker Profiler UI also includes a timeline interface, which provides you with a detailed view into the compute resources at the level of operations and kernels scheduled on the CPUs and run on the GPUs. The timeline is organized in a tree structure, giving you information from the host level to the device level, as shown in the following screenshot.

For each CPU, you can track the CPU performance counters, such as clk_unhalted_ref.tsc and itlb_misses.miss_causes_a_walk. For each GPU on the 2x p4d.24xlarge instance, you can see a host timeline and a device timeline. Kernel launches are on the host timeline and kernel runs are on the device timeline.
You can also zoom in to the individual steps. In the following screenshot, we have zoomed in to step_41. The timeline strip selected in the following screenshot is the AllReduce operation, an essential communication and synchronization step in distributed training, run on GPU-0. In the screenshot, note that the kernel launch in the GPU-0 host connects to the kernel run in the GPU-0 device stream 1, indicated with the arrow in cyan.

Availability and considerations
SageMaker Profiler is available in PyTorch (version 2.0.0 and 1.13.1) and TensorFlow (version 2.12.0 and 2.11.1). The following table provides the links to the supported AWS Deep Learning Containers for SageMaker.

Framework
Version
AWS DLC Image URI

PyTorch
2.0.0
763104351884.dkr.ecr.<region>.amazonaws.com/pytorch-training:2.0.0-gpu-py310-cu118-ubuntu20.04-sagemaker

PyTorch
1.13.1
763104351884.dkr.ecr.<region>.amazonaws.com/pytorch-training:1.13.1-gpu-py39-cu117-ubuntu20.04-sagemaker

TensorFlow
2.12.0
763104351884.dkr.ecr.<region>.amazonaws.com/tensorflow-training:2.12.0-gpu-py310-cu118-ubuntu20.04-sagemaker

TensorFlow
2.11.1
763104351884.dkr.ecr.<region>.amazonaws.com/tensorflow-training:2.11.1-gpu-py39-cu112-ubuntu20.04-sagemaker

SageMaker Profiler is currently available in the following Regions: US East (Ohio, N. Virginia), US West (Oregon), and Europe (Frankfurt, Ireland).
SageMaker Profiler is available in the training instance types ml.p4d.24xlarge, ml.p3dn.24xlarge, and ml.g4dn.12xlarge.
For the full list of supported frameworks and versions, refer to documentation.
SageMaker Profiler incurs charges after the SageMaker Free Tier or the free trial period of the feature ends. For more information, see Amazon SageMaker Pricing.
Performance of SageMaker Profiler
We compared the overhead of SageMaker Profiler against various open-source profilers. The baseline used for the comparison was obtained from running the training job without a profiler.
Our key finding revealed that SageMaker Profiler generally resulted in a shorter billable training duration because it had less overhead time on the end-to-end training runs. It also generated less profiling data (up to 10 times less) when compared against open-source alternatives. The smaller profiling artifacts generated by SageMaker Profiler require less storage, thereby also saving on costs.
Conclusion
SageMaker Profiler enables you to get detailed insights into the utilization of compute resources when training your deep learning models. This can enable you to resolve performance hotspots and bottlenecks to ensure efficient resource utilization that would ultimately drive down training costs and reduce the overall training duration.
To get started with SageMaker Profiler, refer to documentation.

About the Authors
 Roy Allela is a Senior AI/ML Specialist Solutions Architect at AWS based in Munich, Germany. Roy helps AWS customers—from small startups to large enterprises—train and deploy large language models efficiently on AWS. Roy is passionate about computational optimization problems and improving the performance of AI workloads.
Sushant Moon is a Data Scientist at AWS, India, specializing in guiding customers through their AI/ML endeavors. With a diverse background spanning retail, finance, and insurance domains, he delivers innovative and tailored solutions. Beyond his professional life, Sushant finds rejuvenation in swimming and seeks inspiration from his travels to diverse locales.
Diksha Sharma is an AI/ML Specialist Solutions Architect in the Worldwide Specialist Organization. She works with public sector customers to help them architect efficient, secure, and scalable machine learning applications including generative AI solutions on AWS. In her spare time, Diksha loves to read, paint, and spend time with her family.

Hugging Face Introduces IDEFICS: Pioneering Open Multimodal Conversati …

In the dynamic landscape of artificial intelligence, a persistent challenge has cast a shadow over the progress of the field: the enigma surrounding state-of-the-art AI models. While undeniably impressive, these proprietary marvels have maintained an air of secrecy that hides the march of open research and development. Bridging this huge gap, a dedicated research team of Hugging Face has orchestrated a remarkable breakthrough – the inception of IDEFICS (Image-aware Decoder Enhanced à la Flamingo with Interleaved Cross-attentionS). This multimodal language model is not just a mere contender; it stands shoulder to shoulder with its closed proprietary counterparts regarding capabilities. 

Moreover, it operates with refreshing transparency, utilizing publicly available data. The driving force behind this endeavor is to encourage openness, accessibility, and collaborative innovation in AI. In a world craving open AI models that can adeptly handle both textual and image inputs to conjure coherent conversational outputs, IDEFICS emerges as a light of progress.

While current methodologies are commendable, they remain entangled within proprietary confines. The visionaries steering IDEFICS, however, have a bolder proposition: an open-access model that mirrors the performance of its closed counterparts and relies solely on publicly available data. This visionary creation, rooted in the bedrock of Flamingo’s prowess, is offered in two incarnations: an 80 billion parameter variant and a 9 billion parameter variant. This divergence in scope ensures its adaptability across an array of applications. The research team’s aspiration goes beyond mere advancement; they seek to establish a paradigm of transparent AI development that addresses the void in multimodal conversational AI and sets the stage for others to follow.

IDEFICS takes the stage, a true prodigy in multimodal models. With an innate ability to ingest sequences of images and text, it transforms these inputs into contextual, coherent conversational text. This innovation dovetails seamlessly with the team’s overarching mission of transparency – a trait woven into its fabric. The model’s cornerstone is the tower of publicly available data and models, effectively demolishing the walls of entry barriers. The proof lies in its performance: IDEFICS astounds by effortlessly answering queries about images, vividly describing visual narratives, and even conjuring stories rooted in multiple images. The tandem of its 80 billion and 9 billion parameter variants resonates with scalability on an unprecedented scale. This multimodal marvel, birthed through painstaking data curation and model development, unfurls a new chapter in the saga of open research and innovation.

https://huggingface.co/blog/idefics

A resounding response to the difficulties posed by closed proprietary models, IDEFICS emerges as a fireball of open innovation. Beyond mere creation, this model symbolizes a stride towards accessible and collaborative AI development. The fusion of textual and image inputs, yielding a cascade of conversational outputs, heralds the advent of transformation across industries. The research team’s devotion to transparency, ethical scrutiny, and shared knowledge crystallizes the latent potential of AI, poised to benefit humanity at large. In its essence, IDEFICS exemplifies the potency of open research in ushering in a new era of transcendent technology. As the AI community rallies behind this inspiring call, the boundaries of what’s possible expand, promising a brighter, more inclusive digital tomorrow.

Check out the Reference Article. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 29k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, please follow us on Twitter

The post Hugging Face Introduces IDEFICS: Pioneering Open Multimodal Conversational AI with Visual Language Models appeared first on MarkTechPost.

14 Google Drive Add-Ons That Will Save Your Time Every Day

Quick Drive

If you use Google Drive, you’ve probably encountered a situation when you need to quickly look something up in one of your documents or sheets. If you have many files, bookmarking may be challenging. Quick Drive is the permanent answer to this issue. With this Google Drive extension, you can search without opening the app. To use Quick Drive, click its icon in Chrome’s toolbar. A panel will then display where you can type your search terms and quickly access the desired file or folder. This extension provides easy access to recently used and favored files and the ability to filter files by kind.

Zapier

People’s productivity would halt without Google Drive, a popular online storage and collaboration platform. If you want to get the most out of Google Drive, you’ll need to connect it to several more programs. Zapier is handy because it provides numerous integrations that help speed up your processes. Users are given the keys to new realms of efficiency and cooperation through Zapier’s extensive Google Drive connectors. These integrations are innovative answers to common problems, such as automating file management, improving team communication, assuring data backup, gathering form data, and managing social media material. Zapier allows for an easier workflow that saves time and increases productivity by linking Google Drive with other necessary technologies. If you want to get the most out of Google Drive, you must embrace the potential of integrations.

DriveCast

Unfortunately, you can’t play videos directly from Google Drive. A media file stored in Google Drive cannot be cast now from the cloud but must be downloaded to the local computer before it can be released. DriveCast simplifies the procedure considerably. With this add-on, you may stream videos, music, and photos stored in Google Drive to any device that supports Chromecast. However, it only works with media files in Chromecast-compatible formats like JPEG, MP3, and MP4. After installation, launch DriveCast from the Chrome menu. Your Google Drive files and folders will open in a new tab. Find and open the file you wish to play, then choose “DriveCast” from the list of casting options. This extension is useful since it can play media files stored in Google Drive and broadcast live videos from external URLs. Keep in mind, though, that only live videos that are compatible with Chromecast will play.

Sync and backup Google Drive

Nearly everyone shares the fear of data corruption or deletion. That’s why saving a copy of your Google Drive data is recommended on another cloud service, such as Dropbox or OneDrive. In the extremely improbable event that files are accidentally erased on one system, you will still have a copy on another. This procedure of backing up, however, is complex. Usually, you’ll have to do it by hand. However, the Sync Google Drive plugin provides a workable solution. Dropbox, SharePoint, Gmail, Evernote, Box, OneDrive, and Egnyte are just a few online services that can receive G Suite files directly. The nicest aspect of this add-on is that it eliminates the necessity for downloading files to perform real-time synchronization, backup, and migration.

Screencastify

How can you efficiently capture, save, and distribute a Chrome browser screencast? The standard procedure would involve setting up a screen capture program. However, many of these can be cumbersome to run on low-end machines, are prohibitively expensive, or both. The Screencastify plugin is a simple solution. First, it’s an extension; it works within the Chrome browser. The straightforward design of the interface also makes it a breeze to employ. The best thing is there is a free version included. If you upgrade to the paid version, you’ll still spend less than $5 monthly. It lets you choose between saving your recordings as an MP3 audio file, a GIF animated file, or an MP4 video file. With the Screencastify add-on, your recordings are instantly uploaded to Drive. It also generates a shareable link, so you can send the recorded video to anybody you like from your Drive.

Universal File Opener

However, not all non-Google files are online accessible due to compatibility difficulties, even though Google Drive allows you to store them. The files must be downloaded and then used with native programs. The ‘Universal File Opener’ extension helps you access your files from both Google Drive and your computer. This program makes it easy to access files stored on Google Drive directly from within desktop programs without the need first to download the content to your local machine. For example, any papers, presentations, or spreadsheets created in Microsoft Office are included. When you’re done making edits in the native app of your choice, UFO will synchronize the updated version back to Google Drive. Changes you make to files stored elsewhere will be mirrored in real-time on Google Drive. There will be one file containing all the revisions, rather than 30 with names like “Final_Final_V4”.

Bookmarks Backuper

Bookmarks are handy for saving frequently visited websites and keeping tabs on your surfing history. The loss of your bookmarks, though, could mean losing access to your bookmarked pages. To save your bookmarks in Chrome, use the free Bookmarks Backuper plugin. It saves a copy of your bookmarks to Google Drive in case you accidentally delete them. Get the Bookmarks Backuper extension from the Chrome Web Store to get started. The bookmark backup process begins immediately after the extension is installed. The Bookmarks Backuper icon in Chrome’s menu bar allows manual bookmark backups. Using Bookmarks Backuper is a fantastic strategy for safeguarding your bookmarked pages. There’s no risk in giving it a shot because it’s simple to operate and available at no cost.

DriveSlides 

The DriveSlides Chrome add-on lets you quickly turn image folders into Google Slides presentations. It’s a straightforward program that will help you save a ton of time and effort. Obtain the DriveSlides extension from the Chrome Web Store to get started. After the add-on has been installed, access a Google Drive image folder. Select the photographs you wish to include in your presentation by clicking the DriveSlides icon in the Chrome menu bar. After selecting the photos you want to use, DriveSlides will make a new Google Slides presentation. Your presentation can be tailored to your specific needs with DriveSlides. The presentation’s look, content, and structure are all adjustable. Your presentation can be saved as a PDF or PowerPoint file, too. A Google Slides presentation may be made quickly and effortlessly with the help of DriveSlides. It’s great for making slideshows out of collections of images, such as those from a vacation or a product.

AwesomeDrive for Google Drive

Google Drive isn’t designed to work with Office files. A file must be converted to a Google Drive-compliant file format before it can be edited, and then the file must be redownloaded as an MS Office file before changing it. This can be a major pain if you frequently make changes to Microsoft Office files on Drive. AwesomeDrive is a sophisticated answer to the issue. After adding this extension, your Google Drive account will allow you to generate Word, PowerPoint, and Excel documents. If you already have an Office file in your Google Drive, you can open and make native edits. 

Save to Google Drive

With the Save to Google Drive Chrome extension, you can easily save articles, photos, and videos from the web to your Google Drive. You can back up your favorite websites, save articles for later offline reading, or keep track of crucial information. Download the Save to Google Drive add-on from the Chrome Web Store to use the feature. A “Save to Google Drive” button will appear in the Chrome menu after installing the extension. Select text or a picture on the page, then click this button to save it to your Google Drive. The Save to Google Drive add-on allows you to save web pages in PDF, HTML, or MHTML format. Alternatively, you can select a destination folder in Google Drive to save the files. Chrome users could benefit from installing the free and straightforward Save to Google Drive extension.

Transfer Dropbox to Google Drive

The two most well-known cloud storage providers are Dropbox and Google Drive. Both have many advantages and disadvantages, yet they share many similarities. One difference between Google Drive and Dropbox is the free space available to users. The Transfer Dropbox to Google Chrome extension can ease the transition from Dropbox to Google Drive for current users. With this add-on, you can move your Dropbox files to Google Drive quickly and effortlessly. The Transfer Dropbox to Google Chrome add-on may be downloaded from the Chrome Web Store. After installing the extension, you can access your Dropbox and Google Drive files directly from the browser. When you click the add-on, a list of your Dropbox’s contents will appear. Select the file(s) you want to upload, then hit the “Transfer” button. After that, the file will be uploaded to your Google Drive. The add-on also allows you to copy an entire Dropbox folder to Google Drive. Choose the directory you want to transfer and hit the “Transfer” button. After that, the extension will upload every file in that directory to your Google Drive.

Notes for Google Drive

The Notes: Keep Sticky Thoughts in Google Drive extension is a simple yet helpful way to take notes and organize your ideas within Google Drive. The add-on is a powerful tool for increasing efficiency due to its user-friendliness and feature set. Your new Google Drive note will be saved as soon as you start typing. This means your notes will remain accessible even if you log out of your account or restart your computer. The add-on will take the first line of your note and use that to create a title. Because of this, you can quickly locate specific notes whenever you need them. Drag and drop your notes to rearrange their arrangement. This facilitates neat note-taking. Use bold, italics, and lists to emphasize important points in your notes. You may quickly and easily improve the look and readability of your notes in this way. The Google Drive folder where you keep your notes can be quickly accessed via the buttons provided. This facilitates speedy access to your notes.

Doc Builder

The Doc Builder software, available through the Google Workspace Marketplace, streamlines the process of making and organizing documents. It has several tools for enhancing the readability of text, adding visual elements like pictures and tables, and working with others. Text in Doc Builder can be styled in several ways, including by changing its font, size, color, and alignment. Formatting options include bold, italics, and underlining. With Doc Builder, it’s simple to have visual elements like pictures and tables in your papers. Images and tables can be repositioned, resized, and aligned as needed. Doc Builder’s real-time collaboration features allow you and your coworkers to edit documents simultaneously. Conflicts can be settled, and modifications and comments tracked. You may quickly and easily make papers with the help of Doc Builder’s library of premade templates. Formats for documents, including resumes, proposals, and reports, are all provided. You may organize your papers with the help of Doc Builder’s search, sort, and export tools. You can also use folders to keep related files together. You may save time, effort, and stress by using Doc Builder to make and manage documents. This is a good solution if you or your company has to make formal documents.

Mail Merge

The free Chrome add-on Mail Merge for Gmail makes sending customized emails to a large list of people easy. It’s a straightforward program that will help you save a ton of time and effort. Before utilizing Gmail’s Mail Merge feature, you must compile a list of email recipients’ information in a spreadsheet. You may then use Mail Merge for Gmail to import the spreadsheet you produced. Next, draft up a draft of the email you intend to send. You can insert whatever you like in the email template, including text, graphics, and files. Mail Merge for Gmail will automatically add the recipient’s name to each email before sending. Once you’ve reviewed the emails, you may hit the send button in bulk. If your company or organization often sends mass emails to many people, you should use Gmail’s Mail Merge feature. Individuals who wish to contact many people at once might also benefit from this service.

Don’t forget to join our 29k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

If you like our work, please follow us on Twitter

The post 14 Google Drive Add-Ons That Will Save Your Time Every Day appeared first on MarkTechPost.

Watch and Learn Little Robot: This AI Approach Teaches Robots Generali …

Robots have always been at the center of attention in the tech landscape. They always found a place in sci-fi movies, kid shows, books, dystopian novels, etc. Not so long ago, they were just sci-fi dreams, but now they’re all over the place, reshaping industries and giving us a glimpse into the future. From factories to outer space, robots are taking center stage, showing off their precision and adaptability like never before. 

The main goal in the landscape of robotics has always been the same: mirror human dexterity. The quest for refining manipulation capabilities to mirror humans has led to exciting developments. Significant advancement has been made through the integration of eye-in-hand cameras, either as complements or substitutes for conventional static third-person cameras.

While eye-in-hand cameras hold immense potential, they do not guarantee error-free outcomes. Vision-based models often struggle with the real world’s fluctuations, such as changing backgrounds, variable lighting, and changing object appearances, leading to fragility. 

To tackle this challenge, a new set of generalization techniques have emerged recently. Instead of relying on vision data, teach robots certain action policies using diverse robot demonstration datasets. It works to some extent, but there is a major catch. It’s expensive, really expensive. Collecting such data in a real robot setup means time-consuming tasks like kinesthetic teaching or robot teleoperation through VR headsets or joysticks.

Do we really need to rely on this expensive dataset? Since the main goal of robots is to mimic humans, why can we not just use human demonstration videos? These videos of humans doing tasks offer a more cost-effective solution due to the agility of humans. Doing so enables capturing multiple demos without constant robot resets, hardware debugging, or arduous repositioning. This raises the intriguing possibility of leveraging human video demonstrations to enhance the generalization abilities of vision-centric robotic manipulators, at scale. 

However, bridging the gap between human and robot realms isn’t a walk in the park. The dissimilarities in appearance between humans and robots introduce a distribution shift that needs careful consideration. Let us meet with new research, Giving Robots a Hand, that bridges this gap. 

Existing methods, employing third-person camera viewpoints, have tackled this challenge with domain adaptation strategies involving image translations, domain-invariant visual representations, and even leveraging keypoint information about human and robot states.

Overview of Giving Robots a Hand. Source: https://arxiv.org/pdf/2307.05959.pdf

In contrast, Giving Robots a Hand takes a refreshingly straightforward route: masking a consistent portion of each image, effectively concealing the human hand or robotic end-effector. This straightforward method sidesteps the need for elaborate domain adaptation techniques, allowing robots to learn manipulation policies from human videos directly. Consequently, it solves issues arising from explicit domain adaptation methods, like glaring visual inconsistencies stemming from human-to-robot image translations.

The proposed method can train robots to perform a variety of tasks. Source: https://giving-robots-a-hand.github.io/

The key aspect of Giving Robots a Hand lies in the method’s exploration. A method that integrates the wide-ranging eye-in-hand human video demonstrations to enhance both environment and task generalization. It achieves amazing performance across a range of real-world robotic manipulation tasks, encompassing reaching, grasping, pick-and-place, cube stacking, plate clearing, toy packing, etc. The proposed method improves the generalization significantly. It empowers policies to adapt to unfamiliar environments and novel tasks that weren’t witnessed during robot demonstrations. An average surge of 58% in absolute success rates in uncharted environments and tasks becomes evident, as compared to policies solely trained on robot demonstrations.

Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 29k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, please follow us on Twitter

The post Watch and Learn Little Robot: This AI Approach Teaches Robots Generalizable Manipulation Using Human Video Demonstrations appeared first on MarkTechPost.

Persistent Systems shapes the future of software engineering with Amaz …

Amazon CodeWhisperer, the AWS AI coding companion, is a step change in developer productivity tools. Based on generative AI technology, Amazon CodeWhisperer offers contextualized code snippets or recommendations based on natural language prompts to build software quickly, responsibly, and securely. It enables productivity gains and increases accuracy for accelerated digital transformations. Amazon CodeWhisperer ensures enterprises have greater control over AI-generated code, especially the code written by developers who may have a limited understanding of code attribution, quality, and security requirements.
Persistent Systems, a global digital engineering provider, has run several pilots and formal studies with Amazon CodeWhisperer that point to shifts in software engineering, generative AI-led modernization, responsible innovation, and more. This post highlights four themes emerging from Persistent’s Amazon CodeWhisperer experiments that could change software engineering as we know it.
Beyond productivity gains: Reimagining coding with Amazon CodeWhisperer
In this section, we discuss some of the ways that Amazon CodeWhisperer is reimagining coding.
Improving responsible delivery
Ownership, explainability, and transparency of AI-generated code are the most contentious points for the commercial adoption of coding companions such as Amazon CodeWhisperer. Amazon gives developers complete ownership of the code they write using Amazon CodeWhisperer. The Amazon CodeWhisperer team has carefully curated the training data and omitted restrictive licenses, ensuring developers don’t inadvertently use restrictively licensed code when they use Amazon CodeWhisperer. In addition, because recommender pipelines can be strongly influenced by open-source code, if Amazon CodeWhisperer detects a lineage, it flags the license references (for example, MIT or Apache, an open-source project). This enables the developer to attribute code snippets to the source owners, instituting coding best practices. Although Amazon collects data such as code snippets, recommendations, and comments from files open in the integrated development environment, for Amazon CodeWhisperer Professional users, these are not stored or used to train the model. Also, Amazon CodeWhisperer Individual users can opt out of sharing content with AWS, limiting the chances of this being reproduced as recommendations to other users.

Persistent’s approach to generative AI mirrors Richard P. Feynman’s thinking, who said, “I would rather have questions that can’t be answered than answers that can’t be questioned.” Persistent prioritizes responsibility, accountability, and transparency to build client trust. One example of the potential of Amazon CodeWhisperer lies in its ability to reference code, helping clients circumvent legal liabilities that could derail other rewards. For more information about Persistent’s approach to generative AI, refer to Generative AI Services and Solutions.
Moving code security upstream and upfront
Seasoned developers will tell you that security cannot be tested-in; it must be built from the ground up. Although some approaches, such as DevSecOps, make it easier for developers, code security experts, and operations teams to embed security testing while the code is written, Amazon CodeWhisperer takes this one step further. It runs security scans on the code directly in the integrated development environment (IDE), allowing a single developer resource to test the code for quality and security. This highly automated, shift-left scenario for security testing enables enterprises to arrest defects upstream and remedy them at a fraction of the cost and time. Especially now, when coding, with the advent of generative AI moving closer to business users, the automated, in-line security scans in Amazon CodeWhisperer will provide less rework, faster time to production, and resilient code.

Persistent helps leading global organizations fortify their business applications with code embedded with security guardrails. It believes security testing has to shift closer to the developer (professional or citizen) and be encoded into applications as they are written. Amazon CodeWhisperer, with its transformative power to fast-track not just coding but secure coding, fits well into the narrative.
Enabling developer skills to undergo a reboot
Most developers must undergo at least 4 months of training before being tagged to projects. In our pilot, Amazon CodeWhisperer condensed the training period to 1 month with reduced cognitive load concerning understanding the context or coding language. We see this bearing on how companies hire developers, evaluating not the coding knowledge, which has been largely abstracted, but on the prompt engineering expertise and the ability to be creative with tools such as Amazon CodeWhisperer.
The parameters for professional developers will change, and quickly depending on their ability to tune the input to get the desired answer. This also opens the field for citizen developers or business technologists, bringing coding closer to the business.
Driving implementation closer to strategy
With so many moving parts, businesses and their technology partners will return to the whiteboard together. The engagement model will evolve to factor in these new variables (such as faster coding timelines, secure code, more citizen developers, or domain-oriented developers) unleashed by Amazon CodeWhisperer. Coding will now move closer to the business, automatically incorporating security guardrails and mandatory regulations into software applications as they are written, all at scale. And with verticalized workloads, success will depend on the development team’s domain expertise and the ability to translate code into innovation. This means the implementation of the company’s vision through this code will become even more watertight because it adheres to strategic pillars of security, quality, and speed.
From long shots to offshoots – what the future holds
We extrapolated these themes to map a future where Amazon CodeWhisperer can help realize “delivery moon shots” that, up until now, were aspirational. The future looks something like this:

Zero-wastage – Amazon CodeWhisperer, especially with its proactive security scans and reference tracker tool, will ensure the code is of shippable quality, enabling every allied function—from business to developers—to add value and minimize wastage in terms of effort, time to value, or rework. This will bring a singular focus on the core job for each stakeholder, further enforcing a value-first mindset.

Zero ramp-up – The ability to support multiple coding languages, factor in developer notes and comments into code suggestions, and offer lines of code on the fly makes Amazon CodeWhisperer the perfect antidote to the cold start problem for developers. As mentioned, developers don’t need a gestation period before being onboarded on a project. This dramatically cuts down the time to value, allowing implementation partners to deploy resources across projects for better monetization dynamically.

Zero-shot translation – Amazon CodeWhisperer supports multiple programming languages, such as Python, Java, JavaScript, TypeScript, SQL, and more. It will be able to translate code from one programming language to another, or what is called zero-shot translation ability, where it uses reference code in language A to write code in language B more accurately. This unleashes significant changes in how legacy modernization projects are planned and implemented. With the zero-shot translation ability of Amazon CodeWhisperer, Persistent is confident legacy modernization will become faster and no longer be a moon shot.

Zero lifting – Amazon CodeWhisperer is optimized to generate accurate code for other AWS offerings, such as Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB. The accurate code generation makes the lift easy. Because AWS and other major cloud service providers are now pushing forward a multi-cloud narrative, Persistent expects Amazon CodeWhisperer to improve accuracy while recommending code for other solutions offered by AWS peers. This makes the road smoother for multi-cloud or multi-platform settings, eliminating the heavy lifting required while shifting workloads from one service vendor to another—supercharging digital transformation 2.0.

Conclusion
Amazon CodeWhisperer goes beyond improving developer productivity: it democratizes coding and brings it closer to business users while ensuring best practices such as code attribution and enhanced security are never out of the purview.
Persistent is excited about Amazon CodeWhisperer and its potential impact on businesses and partners. It is working to create an Amazon CodeWhisperer-ready developer workforce and alerting its customers about its benefits to drive adoption. Persistent’s strong partnership with AWS makes it the best-fit technology partner to help businesses capitalize on the intrinsic value of Amazon CodeWhisperer.
To learn more about Persistent’s generative AI philosophy that reimagines the way software is engineered today and how Amazon CodeWhisperer aligns with it, refer to Generative AI Services and Solutions.

About the authors
Dr. Pandurang Kamat is Chief Technology Officer, responsible for advanced technology research focused on unlocking business value through innovation at scale. He is a seasoned technology leader who helps customers improve user experience, optimize business processes, and create new digital products. His vision for Persistent is to be an innovation powerhouse that anchors a global and diverse innovation ecosystem, comprising of academia and start-ups. He holds a bachelor’s degree in Computer Engineering from Goa University and Ph.D. in Computer Science from Rutgers University. He is a well-published author with several international research publications, an ACM-India Eminent Speaker, serves on the board of studies at universities, and mentors technology start-ups.
Ankur Desai is a Principal Product Manager within the AWS AI Services team.
Kiran Randhi works for Amazon Web Services as a Principal Partner Solutions Architect in Seattle, Washington. He works closely with AWS Global Strategic SI partners to develop and implement effective cloud strategies that allow them to fully leverage the benefits of cloud technology. Kiran helps CIOs, CTOs, and architects turn their cloud visions into reality by providing architectural guidance and expertise throughout the implementation of strategic cloud solutions. He focuses on AWS security, Migration & Modernization, Data & Analytics, and other technologies to build solutions for different industries in the cloud.