Signals vs scores
Signals are runtime observations emitted while a task is executing.- Use signals for facts about execution:
tests_failed = 3retrieval_hit = truephase = "rerank"
- Use scores for judgments computed after execution:
- exact match
- faithfulness
- pass rate
Outputs vs signals vs scores
- Return task outputs from your
@ze.task(...)dictionary. - Emit signals during execution for runtime observations.
- Compute scores after execution with evaluation functions.
predictionis a task outputphaseandretrieved_doc_countare emitted signals- exact match or faithfulness would be computed later as scores
Emit signals in a task
ze.emit_signal(...) attaches to the current task span by default. If no active
span exists, it falls back to the current trace.
What gets persisted
When tasks run throughdataset.eval(...), ZeroEval creates a task span for each
row execution. Emitted signals are attached to that span and flushed with the
normal tracing pipeline.
This means signals inherit all of the benefits of tracing:
- per-row trace linkage
- span-level inspection
- compatibility with screenshots, attachments, and tags
Surface signals in the app
Signals appear in trace-aware evaluation views:- row detail panels can show emitted signals for a specific result
- trace views can show which span emitted each signal
- eval detail views can summarize common signals across a run
Signals, feedback, and judges
- signals are runtime facts emitted by the task or system
- feedback is corrective human/user input
- judge evaluations are automated judgments produced by judge automations
- emit signals during execution
- compute scores after execution
- use feedback to correct or supervise model behavior