Multimodal Datasets

Supported multimodal helpers

The Dataset class provides:

add_image(row_index, column_name, image_path)
add_audio(row_index, column_name, audio_path)
add_video(row_index, column_name, video_path)
add_media_url(row_index, column_name, media_url, media_type)

Build a multimodal dataset

import zeroeval as ze

ze.init()

ds = ze.Dataset(
    "medical-xray",
    data=[{"row_id": "p001", "symptoms": "Dry cough", "expected_keywords": "pneumonia"}],
    description="Symptoms with chest imaging",
)

ds.add_image(0, "chest_xray", "./assets/p001.jpg")
ds.add_audio(0, "doctor_note_audio", "./assets/p001.wav")
ds.add_media_url(
    0,
    "external_reference",
    "https://example.com/reference/p001.jpg",
    media_type="image",
)

ds.push()

What gets stored

File-based helpers encode content as data URIs (data:<mime>;base64,...)
URL-based helper stores the URL directly

This lets task code consume a consistent row schema regardless of data origin.

Use multimodal fields in tasks

@ze.task(outputs=["diagnosis"])
def diagnose(row):
    # row.chest_xray may be a data URI string
    # row.external_reference may be a URL string
    result = call_model(
        symptoms=row.symptoms,
        image=row.chest_xray,
    )
    return {"diagnosis": result}

Validation rules

Image extensions

Allowed: .jpg, .jpeg, .png, .gif, .webp

Audio extensions

Allowed: .mp3, .wav, .ogg, .m4a

Video extensions

Allowed: .mp4, .webm, .mov

Media URL types

Allowed media_type: image, audio, video

Multimodal assets can be large. Start with small batches and validate model input formatting before scaling up.

Running EvalsExecute tasks on datasets with concurrency, retries, and checkpointing

​Supported multimodal helpers

​Build a multimodal dataset

​What gets stored

​Use multimodal fields in tasks

​Validation rules

Supported multimodal helpers

Build a multimodal dataset

What gets stored

Use multimodal fields in tasks

Validation rules