A new OpenAI team is tackling the challenge of superintelligence alignment to ensure that future AI systems, which are much smarter than humans, follow human intent.
The team, co-led by Ilya Sutskever and Jan Leike, is dedicated to finding scientific and technical breakthroughs to ensure the safe control of AI systems, which could bring unprecedented progress but also prove dangerous by potentially causing unintended consequences for humanity.
The ambitious goal of this new team is to create “the first automated alignment researcher” with human-level capabilities. The team expects to “iteratively align superintelligence” using “vast amounts of compute” and, in just four years, solve the core technical challenges of superintelligence alignment. OpenAI is dedicating 20% of today’s secured computing power to this goal.
Superintelligence alignment is fundamentally a machine learning problem, and we think great machine learning experts—even if they’re not already working on alignment—will be critical to solving it.
Recently, there has been growing criticism that the dystopias of extinction by a super-AI are designed to distract from the current dangers of AI.
“An incredibly ambitious goal”
To achieve this “incredibly ambitious goal,” the team plans to develop a scalable training method, validate the resulting model, and stress test their alignment pipeline.
They plan to focus on scalable monitoring and generalization, which can help provide a training signal for tasks that are difficult for humans to evaluate. In addition, they plan to automate the search for problematic behavior and problematic internal processes to validate system alignment, and to evaluate the entire pipeline using adversarial testing.
While acknowledging that their research priorities may change, the team intends to learn more about the problem and potentially incorporate new areas of research into their approach. OpenAI promises to “share the fruits of this effort broadly” and is looking for researchers and engineers to join the effort.
The new team’s work will complement ongoing projects at OpenAI aimed at improving the safety of current models and understanding and mitigating other AI-related risks, such as misuse, economic disruption, disinformation, bias and discrimination, and addiction and dependency.