Fine-tuned Meta Code Llama outperforms GPT-4 in key benchmark


Shortly after the release of Meta’s Code Llama code model, the open-source community tries to fine-tune it – and immediately achieves a new top score, surpassing OpenAI’s GPT-4.

Phind, an AI co-programming startup, has announced that it has achieved a new high score on the HumanEval benchmark, an important evaluation test for AI programming tasks, with a fine-tuned 34B variant of Meta’s just-released Code Llama.

In the first run, the fine-tuned standard and Python models scored 67.6 and 69.5 percent, respectively. OpenAI’s GPT-4 scored 67 percent on the same benchmark when it was released in March. The standard Code Lama model with 34 billion parameters scored 48.8 percent, according to Meta, while the Python variant scored 53.7 percent.

Model HumanEval Result
Phind 34B standard model 67.6%
Phind 34B Python model 69.5%
GPT-4 (OpenAI model) 67%
Meta Code-Llama 34B 48.8%
Meta Code-Llama 34B Python 53.7%
Meta Unnatural Code Llama (not released) 62%

The two Phind models were fine-tuned natively on a custom dataset of about 80,000 high-quality programming tasks and solutions. According to Phind, Meta already fine-tuned Code Llama with a 62 percent success rate on HumanEval. However, Meta only used 15,000 examples to refine Unnatural Code Llama.



The Phind models were trained using 32 A100-80 GB GPUs and a sequence length of 4096 tokens in three hours. The researchers used DeepSpeed ZeRO 3 and Flash Attention 2 for faster and more efficient training.

Phind publishes both models under the Llama license on Huggingface.

Open-Source Community accelerates Meta’s AI development

The Llama license allows both scientific and commercial use, but the latter is restricted as a special license is required for use in widespread applications. In addition, data generated with Llama 2 may not be used to train new AI models.

Meta’s Llama 2 language model also now has numerous refinements that outperform Meta’s original release in benchmarks. This is likely Meta’s goal: to improve their models faster thanks to the open-source community.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top