DeepMind at NeurIPS ’23

Research

Published: 8 December 2023

Towards more multimodal, robust, and general AI systems

Next week marks the start of the 37th annual conference on Neural Information Processing Systems (NeurIPS), the largest artificial intelligence (AI) conference in the world. NeurIPS 2023 will be taking place December 10-16 in New Orleans, USA. Teams from across Google DeepMind are presenting more than 180 papers at the main conference and workshops. We’ll be showcasing demos of our cutting-edge AI models for global weather forecasting, materials discovery, and watermarking AI-generated content. There will also be an opportunity to hear from the team behind Gemini, our largest and most capable AI model. Here’s a look at some of our research highlights:

Multimodality: language, video, action

UniSim is a universal simulator of real-world interactions.

Generative AI models can create paintings, compose music, and write stories. But however capable these models may be in one medium, most struggle to transfer those skills to another. We delve into how generative abilities could help to learn across modalities. In a spotlight presentation, we show that diffusion models can be used to classify images with no additional training required. Diffusion models like Imagen classify images in a more human-like way than other models, relying on shapes rather than textures. What’s more, we show how just predicting captions from images can improve computer-vision learning. Our approach surpassed current methods on vision and language tasks, and showed more potential to scale. More multimodal models could give way to more useful digital and robot assistants to help people in their everyday lives. In a spotlight poster, we create agents that could interact with the digital world like humans do — through screenshots, and keyboard and mouse actions. Separately, we show that by leveraging video generation, including subtitles and closed captioning, models can transfer knowledge by predicting video plans for real robot actions. One of the next milestones could be to generate realistic experience in response to actions carried out by humans, robots, and other types of interactive agents. We’ll be showcasing a demo of UniSim, our universal simulator of real-world interactions. This type of technology could have applications across industries from video games and film, to training agents for the real world.

Building safe and understandable AI

An artist’s illustration of artificial intelligence (AI). This image depicts AI safety research. It was created by artist Khyati Trehan as part of the Visualising AI project launched by Google DeepMind.

When developing and deploying large models, privacy needs to be embedded at every step of the way. In a paper recognized with the NeurIPS best paper award, our researchers demonstrate how to evaluate privacy-preserving training with a technique that is efficient enough for real-world use. For training, our teams are studying how to measure if language models are memorizing data – in order to protect private and sensitive material. In another oral presentation, our scientists investigate the limitations of training through “student” and “teacher” models that have different levels of access and vulnerability if attacked. Large Language Models can generate impressive answers, but are prone to “hallucinations”, text that seems correct but is made up. Our researchers raise the question of whether a method to find a fact stored location (localization) can enable editing the fact. Surprisingly, they found that localization of a fact and editing the location does not edit the fact, hinting at the complexity of understanding and controlling stored information in LLMs. With Tracr, we propose a novel way of evaluating interpretability methods by translating human-readable programs into transformer models. We’ve open sourced a version of Tracr to help serve as a ground-truth for evaluating interpretability methods.

Introducing AI for customer service

Top Stories

Exploring iPhone 16 Pro: Top 3 Features of Apple’s Cutting-Edge Device

Asad Iqbal’s Hyperbolic Tangent (Tanh) Activation Function | Sep 2024

Unpatched cameras fuel ‘Corona Mirai’ botnet surge

DeepMind at NeurIPS ’23

Multimodality: language, video, action

Building safe and understandable AI

Leave a Reply Cancel reply

Related Strories

Why You Need Methodologists | Mel Richey, PhD | Oct 2024

Multi-label User Utterance Classification in Natural Language | Shubhamggaur | Oct 2024

Identifying agent presence in system within 80 characters

Reliance Stock Forecast Model 2.0 – Kalash Shah

Quick Links

Follow Socials

Introducing AI for customer service

Top Stories

Exploring iPhone 16 Pro: Top 3 Features of Apple’s Cutting-Edge Device

Asad Iqbal’s Hyperbolic Tangent (Tanh) Activation Function | Sep 2024

Unpatched cameras fuel ‘Corona Mirai’ botnet surge

DeepMind at NeurIPS ’23

Multimodality: language, video, action

Building safe and understandable AI

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Why You Need Methodologists | Mel Richey, PhD | Oct 2024

Multi-label User Utterance Classification in Natural Language | Shubhamggaur | Oct 2024

Identifying agent presence in system within 80 characters

Reliance Stock Forecast Model 2.0 – Kalash Shah

Get Insider Tips and Tricks in Our Newsletter!