Posts, thoughts, code guides, and PDFs.
A summary of Anthropic's guide on building evaluations for AI agents — from grader types to practical roadmaps.