Some companies are looking for alternative sources of data training now that the internet is growing too small, with things like publicly-available video transcripts and even AI-generated synthetic data as options.
OpenAI and Anthropic are exploring the use of synthetic data to train AI models, aiming to avoid issues like 'model collapse' by creating higher-quality synthetic data.
Anthropic admitted that its Claude 3 LLM model was trained on 'data we generate internally,' indicating a move towards more controlled synthetic data usage.
Concerns about AI firms facing a data shortage are prompting exploration of novel and sometimes controversial means of data training.
Collection
[
|
...
]