A community effort to curate the best open post-training datasets.
We are currently working on OpenThoughts-Agent, a collaboration building the best open agent training datasets.
Our first project was curating open reasoning data recipes. OpenThoughts3, our best reasoning dataset recipe, is detailed in our release blog and the full paper.
About us
We are a collaboration led by researchers and engineers from Stanford, UC Berkeley, UT Austin, NYU, University of Washington, UCSD, ASU, CMU, UCLA, UNC Chapel Hill, TUM, LAION, and other partners focused on building the best datasets (and therefore the best models). See our previous work atdatacomp.ai andmlfoundations.
OpenThoughts is powered by a broad ecosystem: academic and national lab clusters such as Juelich Supercomputing Center (JSC), TACC, ALCC Perlmutter, ZIH, and Oumi Exun by Lambda Labs; and supporters across the startup community including Daytona.io, Laude Institute, Bespoke Labs, and Oumi.ai, alongside long-standing partners such as the NSF IFML, UT Austin Machine Learning Lab, Toyota Research Institute, Lambda Labs, and the NHR Center of TU Dresden.