1 Text

The Pile The Pile The Pile: An 800GB Dataset of Diverse Text for Language Modeling(Gao et al. 2020)

References

Gao, Leo, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, et al. 2020. “The Pile: An 800GB Dataset of Diverse Text for Language Modeling.” arXiv. https://doi.org/10.48550/arXiv.2101.00027.