1 Text
The Pile The Pile The Pile: An 800GB Dataset of Diverse Text for Language Modeling(Gao et al. 2020)
References
Gao, Leo, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, et al. 2020. “The Pile: An 800GB Dataset of Diverse Text for Language Modeling.” arXiv. https://doi.org/10.48550/arXiv.2101.00027.