Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models

BTM improves in- and out-of-domain perplexities as compared to GPT-style Transformer LMs, and gains grow with the number of domains, suggesting more aggressive parallelism could be used to efficiently train larger models in future work.

Fri Aug 05 2022
Citations
48
by Margaret Li, Suchin Gururangan and others
CHAT WITH RESEARCH


QUESTIONS & ANSWERS

Log in to generate
TL;DR
AI KEY POINTS
ABSTRACT
PAPER
BTM improves in- and out-of-domain perplexities as compared to GPT-style Transformer LMs, and gains grow with the number of domains, suggesting more aggressive parallelism could be used to efficiently train larger models in future work.


Research is provided by Semantic Scholar and AI-generated text may at times produce inaccurate results.
Information provided on this site does not constitute legal, financial, medical, or any other professional advice.

DATA LICENSING
Search and article data is provided under CC BY-NC or ODC-BY and via The Semantic Scholar Open Data Platform. Read more at Kinney, Rodney Michael et al. “The Semantic Scholar Open Data Platform.” ArXiv abs/2301.10140 (2023): n. pag.