It’s no secret that AI technology is rapidly advancing. The latest, however, could be one of the most impressive developments in AI and large language models (LLMs). Tech giant Google recently achieved a groundbreaking feat using AI with a new LLM that allows humans and dolphins to converse. It’s called DolphinGemma.

DolphinGemma

The model is specifically trained on the WDP's acoustic database of wild Atlantic spotted dolphins;
The model is specifically trained on the WDP’s acoustic database of wild Atlantic spotted dolphins; Photo: Google

DolphinGemma is an AI model developed by Google that uses specific Google audio technologies. According to Google, the SoundStream tokenizer “efficiently” represents dolphin sounds. The sounds are then processed by a model architecture suited for complex sequences. This ~400M parameter model is specifically sized to run on Pixel phones.

There’s a reason that it is designed to run on Pixel phones. The Wild Dolphin Project (WDP) uses these devices during their underwater dolphin research projects. Google is collaborating with researchers at Georgia Tech and the field research of the (WDP) to advance the DolphinGemma technology.

A primary focus for WDP is observing and analyzing the dolphins’ natural communication and social interactions. Working underwater allows the researchers to link sounds and specific behaviors. They have correlated links, such as signature whistles used by mothers and calves to reunite, burst-pulse “squawks” are often seen during fights, and click “buzzes,” often used during courtship or chasing sharks.

Google says the new model is powered by Gemma, which also powers Google’s Gemini models. It’s trained extensively on WDP’s acoustic database of wild Atlantic spotted dolphins. DolphinGemma functions as an audio-in, audio-out mode. This means the model processes sequences of natural dolphin sounds to identify patterns and structures. Ultimately, it predicts the “likely subsequent sounds in a sequence.” For example, it’s much like how LLMs for human language predict the next word or token in a sentence.

Because the DolphinGemma can identify recurring sound patterns, clusters, and reliable sequences, it helps researchers uncover “hidden structures” and “potential meanings” in dolphin communication, which once required immense human efforts. Google hopes it could one day establish a shared vocabulary with the dolphins for interactive communication.

Analyzing Dolphin Sounds With Pixel Phones

WDP also wants to explore two-way interaction using technology in the ocean. This led to the development of the CHAT (Cetacean Hearing Augmentation Telemetry) system. CHAT is designed to establish a simpler, shared vocabulary.

According to Google, “The concept first relies on associating novel, synthetic whistles (created by CHAT, distinct from natural dolphin sounds) with specific objects the dolphins enjoy, like sargassum, seagrass, or scarves the researchers use.”

Researchers hope that the dolphins, who are naturally curious, will learn to mimic the synthetic whistles to request these items. As more natural sounds are understood, they can also be added to the system. Google says, “A Google Pixel 6 handled the high-fidelity analysis of dolphin sounds in real-time.”

The company says using the smartphones eliminates the need for custom hardware, improves system maintainability, lowers power consumption, and shrinks the device’s cost and size, which are major benefits during field research in the open ocean.