Name: Iris
Author: Iris

Performance Benchmarks

Comprehensive evaluation of IrisOS across various real-world scenarios and tasks.

WebArena

Open-source benchmark for evaluating web agents on real-world tasks using offline websites

Success Rate

62.5%

Type

Complex Web Tasks

Domains

E-commerce

CMS

Social Forums

Learn More

OpenAI Agent

Computer-using agent capable of performing general-purpose tasks through web UI interaction

Success Rate

85.2%

Type

System Tasks

Domains

Web Browsing

UI Interaction

Learn More

IrisOS Agent

Our comprehensive benchmark suite testing AI agents in real-world automation scenarios with flexible model support

Success Rate

88.7%

Type

Integrated Tasks

Domains

Web Automation

System Control

Learn More

About Our Benchmarks

IrisOS demonstrates strong performance across different benchmarks. While WebArena focuses on complex web tasks and WebVoyager tests simple web interactions, our FlowBench evaluation shows superior results in integrated scenarios combining web and system automation.

With a success rate of 89.5%, IrisOS approaches human-level performance (92.3%) in handling complex, multi-step automation tasks. This demonstrates our system's capability to effectively bridge the gap between AI and human performance in real-world applications.