Microsoft’s UFO abducts traditional user interfaces for a smarter Windows experience



summary
Summary

Traditional user interfaces may fade into the background as AI technologies advance. With UFO, Microsoft is demonstrating how easy it could be to interact with Windows in the future.

Microsoft has developed an agent framework called UFO that can autonomously answer user queries within Windows.

UFO stands for “UI-Focused Agent” and is based on the GPT-4V image recognition model from OpenAI. It analyzes the graphical user interface and controls of Windows apps to seamlessly navigate within and between them.

Demonstration of a multi-step task that UFO can handle. | Image: Zhang et al.

UFO combines two agents that make decisions about which apps and controls to select to handle user requests. The AppAgent selects the right app, while the ActAgent performs specific actions in the selected app. A control interaction module translates the selected actions into executable operations.

Ad

Ad

Image: Zhang et al.

Microsoft evaluated UFO’s performance using WindowsBench, a benchmark consisting of 50 user requests in nine common Windows applications such as Outlook, PowerPoint, File Explorer, and Adobe’s Acrobat Reader.

UFO completed 86 percent of the tasks, significantly higher than other models such as GPT-3.5 and GPT-4, whose commands were executed by humans instead of UFO via GPT-4V.

UFO also required fewer steps on average and took more security precautions, such as avoiding irreversible file deletion.

Image: Zhang et al.

However, the researchers acknowledge significant limitations of their system. UFO could only perform controls and actions supported by the Python package pywinauto and Windows UI automation. They also noted difficulties when UFO was confronted with unusual application GUIs.

Microsoft plans to improve UFO by supporting alternative backends and by integrating dedicated GUI models for visual recognition. In addition, connecting to online search engines as an external knowledge base could improve the agent’s ability to adapt to unknown GUIs.

Recommendation

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top