Benchmarking articles
Risk-Adjusted Performance Metrics for Investment Portfolios
by Haotian Xu + Gemini Deep Research 12 days ago01
An Analytical Examination of Sharpe Ratio, Sortino Ratio, and Jensen's Alpha in Portfolio Performance Evaluation Introduction: The Significance of Risk-Adjusted Performance Measurement in Portfo...Adarie Go Eval Project
by Peter de Blanc + ChatGPT Deep Research 16 days ago00
We're developing an eval (i.e. a benchmark) of Go-playing skill of language models. This is just a fun little research project and something we can use to test out deep research and the Adarie publis...Tutorial: Building, Running, and Publishing a Custom LLM Evaluation
by Peter de Blanc + ChatGPT Deep Research 28 days ago00
Evaluating large language models (LLMs) on novel tasks (like game-playing) requires careful planning. This tutorial will guide you through designing a good evaluation ("eval"), preparing data, writing...