In May, MosaicML released what was then one of the best open-source language models, and now the startup is following up with a bigger and more powerful version.
After MPT-7B, MosaicML has released MPT-30B, its second major open-source language model. The new model is a 30-billion-parameter model that MosaicML claims surpasses the performance of OpenAI’s GPT-3, despite having about one-sixth the number of parameters.
In some areas, such as coding, it is said to outperform open source models such as Meta’s LLaMA or Falcon, and in other areas, it is on par or slightly worse. As always, this information is difficult to verify at this time. Like its predecessor, MPT-30B can be used for commercial purposes and comes in two variants: MPT-30-Instruct, a model trained to follow short instructions, and the chatbot model MPT-30B-Chat.
MPT-30B comes with a longer context window
MPT-30B has also been trained on longer sequences (up to 8,000 tokens) than GPT-3, LLaMA or Falcon (2,000 tokens each). The context length, which is half that of the latest “GPT-3.5-turbo” variant, makes it well suited for use cases where a lot of text or code needs to be processed simultaneously. However, with additional optimization, the sequence length could easily be doubled during fine-tuning or inference, according to MosaicML.
As an example, the company cites applications in industries such as healthcare or banking that do not want to hand over their data to OpenAI. The extended context window could be used to interpret lab results and provide insights into a patient’s medical history by analyzing different inputs.
MosaicML targets OpenAI’s proprietary platform.
MPT-30B is also said to be more computationally efficient than Falcon or LLaMA, running on a single graphics card with 80 gigabytes of memory. Naveen Rao, co-founder, and CEO of MosaicML, explained that the Falcon model, with its 40 billion parameters, could not run on a single GPU.
However, Rao sees proprietary platforms like OpenAI as the real competition; open-source projects are ultimately all on the same team, he said. He emphasized that open-source language models are “closing the gap to these closed-source models.” OpenAI’s GPT-4 is still clearly superior, he said, but the time has come when they have “crossed the threshold where these models are actually extremely useful.”