'D T01 25-26.xlsx', 'D T02 25-26.xlsx', 'D T03 25-26.xlsx', 'D T04 25-26.xlsx', 'D T05 25-26.xlsx', 'D T06 25-26.xlsx', 'D T07 25-26.xlsx', 'D T08 25-26.xlsx' df ...
Abstract: Although large language models (LLMs) have demonstrated impressive performance in code generation, they still face challenges when dealing with complex code generation tasks. In the software ...
code-agent-eval is a TypeScript library for evaluating prompts against coding agents (Claude Code, Cursor, etc.). Test prompt reliability by running them multiple times, capturing code changes, and ...
Autoresearch is an autonomous overnight iteration engine for Claude Code — you point it at your codebase, give it a measurable goal and a shell command to verify progress, and it runs a tight Modify → ...