The Atlantic Creates Searchable Database of AI Music Training Data
The Atlantic's Alex Reisner uncovers four datasets of music used to train AI models, making them searchable for the public.

The Atlantic reporter Alex Reisner recently uncovered four datasets of music being used to train AI models and made them fully searchable for the public. Two of the sets are enormous, comprising 12 million and 9 million tracks. The other two are much smaller, but still represent a substantial amount of training data, with over 100,000 songs each.
According to Reisner, the sets have been downloaded thousands of times, and while it's impossible to know exactly who has used them, Google and Stability have both confirmed their use in research papers. Some of the sources, like the Free Music Archive dataset, are free to stream for personal use but restrict commercial or redistributive use. The datasets are now searchable through The Atlantic's database, providing transparency into the music used to train AI models.
This move could have significant implications for the development of AI-generated music and the music industry as a whole. The use of large datasets to train AI models has become increasingly common, but the lack of transparency around what data is being used has raised concerns. By making these datasets searchable, Reisner aims to provide insight into the inner workings of AI music generation.
Why this matters: The creation of a searchable database of music used to train AI models has far-reaching implications for the music industry and AI development. It highlights the increasing reliance on large datasets to train AI models and raises questions about ownership, usage rights, and the potential impact on human musicians. As AI-generated music becomes more prevalent, understanding what data is being used to train these models is crucial.
This development could lead to more transparency and accountability in AI development, but also raises concerns about the potential for AI-generated music to disrupt traditional music industry business models. Developers and businesses will need to consider the implications of using these datasets, while consumers may benefit from more transparent and accountable AI-generated music. However, many questions remain unanswered, such as how these datasets will be updated and who will be responsible for ensuring the accuracy and fairness of AI-generated music.
Source: The Verge