We are announcing Open Thoughts, an open-source effort to curate the best open reasoning datasets. Open Thoughts is a collaboration led by Bespoke Labs and the DataComp community from Stanford, UC Berkeley, UT Austin, UW, UCLA, UNC, TRI, and LAION.

Recent breakthroughs such as SkyT1, STILL-2, and DeepSeek-R1 have shown that a few hundred thousand reasoning demonstrations suffice to substantially improve the reasoning capabilities of a language model. With the release of DeepSeek-R1, such thinking demonstrations can now be synthetically created at low cost and at scale.
While this process of reasoning distillation works surprisingly well, the corresponding datasets and data generation strategies unfortunately remain closed. Moreover, there is a rich design space in data generation for reasoning that the community is only beginning to explore.
The goal of Open Thoughts is to bridge this gap and create state-of-the-art open reasoning datasets. In the process, we are publicly iterating on and sharing the best datasets and data recipes for reasoning data. We invite the community to join us to build, explore, and push the frontier of reasoning models forward together.
Model | AIME24 | MATH500 | GPQA-D | LCB Easy | LCB Med | LCB Hard |
---|---|---|---|---|---|---|
OpenThinker-7B | 31.3 | 83.0 | 42.4 | 75.3 | 28.6 | 6.5 |
Bespoke-Stratos-7B | 22.7 | 79.6 | 38.9 | 71.4 | 25.2 | 0.8 |
DeepSeek-R1-Distill-Qwen-7B | 60.0 | 88.2 | 46.9 | 79.7 | 45.1 | 14.6 |
gpt-4o-2024-08-06 | 8.7 | 75.8 | 46.5 | 87.4 | 42.7 | 8.9 |
o1-mini | 64.0 | 85.6 | 60.0 | 92.8 | 74.7 | 39.8 |
Today, we are also releasing our first dataset Open-Thoughts-114k and model Open-Thinker-7B based on Qwen-2.5-7B-Instruct. We scaled the data strategy from Bespoke-Stratos-17k, resulting in a significant improvement over Bespoke-Stratos-7B. The numbers reported in the table above are evaluated with our open-source tool Evalchemy.
We are just getting started. A new field of exciting research has just opened up. If you want to contribute or sponsor the Open Thoughts effort, please get in touch or raise an issue on GitHub. Come and join us on this journey!
Citation
@misc{guha2025openthoughtsdatarecipesreasoning,
title={OpenThoughts: Data Recipes for Reasoning Models},
author={Etash Guha and Ryan Marten and Sedrick Keh and Negin Raoof and Georgios Smyrnis and Hritik Bansal and Marianna Nezhurina and Jean Mercat and Trung Vu and Zayne Sprague and Ashima Suvarna and Benjamin Feuer and Liangyu Chen and Zaid Khan and Eric Frankel and Sachin Grover and Caroline Choi and Niklas Muennighoff and Shiye Su and Wanjia Zhao and John Yang and Shreyas Pimpalgaonkar and Kartik Sharma and Charlie Cheng-Jie Ji and Yichuan Deng and Sarah Pratt and Vivek Ramanujan and Jon Saad-Falcon and Jeffrey Li and Achal Dave and Alon Albalak and Kushal Arora and Blake Wulfe and Chinmay Hegde and Greg Durrett and Sewoong Oh and Mohit Bansal and Saadia Gabriel and Aditya Grover and Kai-Wei Chang and Vaishaal Shankar and Aaron Gokaslan and Mike A. Merrill and Tatsunori Hashimoto and Yejin Choi and Jenia Jitsev and Reinhard Heckel and Maheswaran Sathiamoorthy and Alexandros G. Dimakis and Ludwig Schmidt},
year={2025},
eprint={2506.04178},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2506.04178},
}