Tag: Data Replay
All the articles with the tag "Data Replay".
-
TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining
This paper introduces TiC-LM, a web-scale benchmark for time-continual LLM pretraining using 114 Common Crawl dumps, demonstrating that replay and autoregressive schedules can match Oracle retraining on general web data with less compute, though trade-offs persist across domains.