Google rolls out AI model “Gemini Pro”, “Gemini Ultra” to beat GPT-4



summary
Summary

Google’s long-awaited Gemini Pro AI model finally debuts in Bard, albeit in a smaller version with fewer capabilities. With Ultra, Google is teasing a larger Gemini model for early 2024 that is supposed to beat OpenAI’s GPT-4.

According to Google, Gemini Pro is a competitor to OpenAI’s year-old GPT-3.5 AI model. It is supposed to outperform OpenAI’s model in six out of eight benchmarks. An even more compact version, Nano (1.8B parameters and 3.25B parameters), is optimized for Android app development. The Nano models are distilled from the larger Gemini models.

The Pro and Nano are available through the Google Cloud, and Google says they run on its own TPU AI chips. Google does not specify parameters for the larger models. Like other providers’ LLMs, Google says Gemini is still struggling with hallucinations.

Google is offering three sizes of the Gemini model. Nano is optimized for mobile devices, Pro is the GPT-3.5 competitor, Ultra is supposed to surpass GPT-4 and will be available in early 2024.
Google is offering three sizes of the Gemini model. Nano is optimized for mobile devices, Pro is the GPT-3.5 competitor, Ultra is supposed to surpass GPT-4 and will be available in early 2024. | Bild: Google

The largest version of Gemini, Ultra, is expected to outperform OpenAI’s GPT-4 on popular benchmarks for text and image understanding and code generation. Ultra will be released in early 2024 and will also be integrated into an “advanced” version of the Bard chatbot (see below).

Ad

Ad

According to Google, Gemini Ultra should outperform OpenAI GPT-4 on popular benchmarks. An independent study has yet to be done. | Image: Google AI
Google Gemini is allegedly good enough for state-of-the-art results in many tasks, especially in audio and video understanding. | Image: Google AI

Google’s benchmark results need to be confirmed by independent, third-party testers. More benchmark results are available from Deepmind.

Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks — notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined.

Google Deepmind, Technical Report

As expected, Gemini is multimodal, meaning it can handle text, images, audio, video, and code. Gemini does not currently offer image generation, but according to the technical documentation, this feature is available and will probably be introduced over time. Gemini can be prompted with images, text, or a combination of the two.

Google Gemini can also generate images based on text, images, or both. | Bild: Image: Google

The following video demonstrates Gemini’s multimodal capabilities.

Try Gemini Pro in Google Bard

Google is integrating Gemini with Bard in two phases. Beginning today, Bard will use a customized version of Gemini Pro English that provides enhanced features for understanding, summarizing, planning, and coding. Gemini Pro English is available in more than 170 countries and territories, according to Google.

According to Google, Gemini Pro outperformed GPT-3.5 in six out of eight benchmarks, including Massive Multitask Language Understanding (MMLU) and GSM8K, which measures elementary school-level math problem-solving skills. In independent third-party blind tests, Bard was rated the preferred free chatbot over ChatGPT, according to Google.

Recommendation

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top