Concept frequency was found to be predictive of performance across various models and tasks. A high concept frequency can complicate the scaling trend between concept frequency and performance. In testing the societal relevance of model outputs, Stable Diffusion was evaluated using a dataset of 50,000 scraped public figures. Inconsistencies in human-rated scores prompted a supplementary human evaluation. Overall, the results contribute significantly to understanding the relationship between pretraining data characteristics and model performance in zero-shot contexts.
Inconsistent trends were observed for human-rated scores retrieved from HEIM, prompting a small-scale human evaluation to reassess these findings.
A societal relevance factor drove the decision to test Stable Diffusion on generating public figures, using a carefully curated dataset of 50,000 entities.
The scaling trend observed in performance is weakened by high concept frequency, suggesting the relationship between frequency and performance is complex.
Concept frequency was computed from LAION-Aesthetics text captions, revealing significant patterns in how these frequencies predict downstream task performance.
Collection
[
|
...
]