Performance Benchmarks
Comprehensive evaluation of IrisOS across various real-world scenarios and tasks.
WebArena
Open-source benchmark for evaluating web agents on real-world tasks using offline websites
OpenAI Agent
Computer-using agent capable of performing general-purpose tasks through web UI interaction
IrisOS Agent
Our comprehensive benchmark suite testing AI agents in real-world automation scenarios with flexible model support
About Our Benchmarks
IrisOS demonstrates strong performance across different benchmarks. While WebArena focuses on complex web tasks and WebVoyager tests simple web interactions, our FlowBench evaluation shows superior results in integrated scenarios combining web and system automation.
With a success rate of 89.5%, IrisOS approaches human-level performance (92.3%) in handling complex, multi-step automation tasks. This demonstrates our system's capability to effectively bridge the gap between AI and human performance in real-world applications.