Evals: Measure Before You Improve — Building AI-Powered Products | Octo