TuneJury
TuneJury is an open reward model for music generation. It reads a prompt and an audio clip, and returns one preference score.
How it scores
Audio passes through CLAP and MERT. The prompt passes through CLAP text. A 2.8M-parameter MLP turns the three embeddings into a single scalar, trained pairwise on human A vs. B comparisons.
TuneJury · released
Released checkpoint. Encoders stay frozen, and only the head is trained.
By the numbers
Four open sources of human ratings. No pseudo-label augmentation.
AIME 12,480 · SongEval 2,491 · MusicPrefs 2,012 · Music Arena 571
One frozen reward
Mode 1
Generate N candidates, keep the highest-scoring one. Reward rises monotonically through N = 32 on four open-weights backbones.6
Mode 2
Backpropagate the score through the sampler into the starting noise, DITTO style. The backbone stays frozen.
Mode 3
Fine-tune a backbone on its own top-scoring outputs, mapping the trade-off between reward and distributional fidelity.
Hear it
Each pair uses one prompt and one backbone. Only the reward signal changed the outcome.
MusicGen-medium. One random sample vs. the top pick of 16.
“A dark trance track featuring accordion, blending hypnotic rhythms with melancholic melodies and a pervasive, atmospheric mood.”
TangoFlux. The same noise, pushed toward higher reward.
“A melancholic rap piece driven by a steady drummachine beat, layered with subtle synth pads and a sparse electric guitar, creating a reflective, introspective atmosphere. …”
FluxAudio-S. Baseline vs. fine-tuned on its own best outputs.
“A fast garage track featuring an electric guitar, driven by raw energy and a loose, rhythmic feel.”
The listening demo pairs every sample with its TuneJury score.
Open the listening demoThe released scores
Seven open-license collections, scored with the released checkpoint. Drag the threshold and see what a score filter keeps.5
Share kept = clips scoring above τ.
Anchor calibration
A reward model trained today meets systems released tomorrow. Anchor calibration fits one bias per new system on a handful of preference pairs, with the model itself left untouched.
Unless a variant is named, every result on this page uses the released CLAP+MERT checkpoint.
Data: MTG-Jamendo · FMA · MagnaTagATune · OpenMIC · MidiCaps · MusicCaps · Song Describer Dataset · Music Arena · MusicPrefs · AIME · SongEval
Models & methods: LAION-CLAP · MERT · MuQ-MuLan · MusicGen · AudioLDM 2 · ACE-Step · TangoFlux · Stable Audio Open Small · MeanAudio · DITTO