Accepted | Workshop Paper (10 pages) | CHI-25 Workshop — Human-Centered Evaluation and Auditing of Language Models