TIL: An LLM Foundry Metric Must Have “Accuracy” in Its Name

ml
llms
evals
llm-foundry
Author

Greg Gandenberger

Published

February 19, 2025

Note

This is a TIL (“Today I Learned”) post. I expect it to be useful to my future self and maybe to others, but it is meant to be a quick, informal way to capture something I learned rather than a polished presentation.

My last post described limitations of LLM Foundry’s default evaluation procedure for open-ended math problems and suggested that we could do better by creating a custom metric that uses Math-Verify.

I encountered a “gotcha” in the process of creating that custom metric: the metric must have “Accuracy” in its name. Otherwise LLM Foundry’s evaluation script skips it!

I created this GitHub issue to track this problem. I hope that we can eliminate this behavior, or at least make it more obvious.