Featured
inference
gpu
Shared Backbones: Loading Weights Once, Serving Many Models
Many multimodal and multi-task models share the same underlying text encoder or LLM backbone. This post explores loading shared backbones once and letting multiple heads reuse them.
·
8 min read