Hugging Face Quietly Transforms Storage: The Great Migration from Git LFS to Xet

Listen to this Post

Featured Image

A Seamless Revolution in AI Storage

In a move that’s reshaping the way AI models and datasets are stored and transferred, Hugging Face has successfully migrated a massive portion of its infrastructure from Git LFS to Xet, a new high-performance storage backend. What makes this even more impressive is that it happened quietly—without disrupting user workflows or drawing headlines. This shift reflects a deliberate and thoughtful approach to scalability, user experience, and next-gen AI readiness.

Below is an in-depth look at this historic migration and why it matters to developers, data scientists, and AI infrastructure teams worldwide.

📦 From LFS to Lightning: A Hugging

Earlier this year, Hugging Face introduced Xet to a small portion of its users, rerouting about 6% of Hub downloads through this new infrastructure. Within just six months, that number skyrocketed, with over 500,000 repositories—housing 20 petabytes of data—transitioning to Xet. By May, Xet became the default storage backend for new users and organizations.

The migration was remarkably smooth, with only a handful of GitHub issues, forum posts, and Discord chats—an extraordinary feat for a change of this scale. The key to this success? Years of prior experience developing the Content Addressed Store (CAS), a robust Rust-based client, and clever middleware like the Git LFS Bridge, which maintained backward compatibility and smoothened the transition.

Hugging Face designed Xet to coexist with Git LFS. Users weren’t forced to overhaul their workflows or download new tools. Instead, background content migrations ran silently, ensuring zero disruption while pushing terabytes and petabytes of content into Xet. Whether users were on legacy clients or the latest Xet-aware systems, everything just worked.

Technically, Xet chunks files on upload and reconstructs them on download using CAS and S3. If users had older clients, the Git LFS Bridge stepped in to emulate the LFS experience while silently transitioning data on the backend.

The migration process was orchestrated through webhooks and a distributed job queue that assigned file batches to migration worker pods. These pods downloaded LFS content, uploaded it to Xet using xet-core, and enabled repositories one by one. Large-scale migrations, like those for bartowski (500 TB), RichardErkhov (1.7 PB), and mradermacher (6.1 PB), stress-tested and ultimately refined the system.

After resolving network and disk I/O bottlenecks and scaling CAS throughput by nearly 10x—from 35 Gb/s to over 300 Gb/s—the infrastructure was ready to serve even the largest AI workloads.

By designing Xet around two principles—“Do no harm” and “Drive impact fast”—Hugging Face avoided a forced cut-over. They rolled out hf-xet, hardened it, and integrated it into the main client. All while keeping uploads and downloads compatible with existing tools.

Xet has now been deployed to support major players like Meta, OpenAI, Google, and Qwen, and it’s only getting bigger. Starting this month, all Hugging Face users will gain access to Xet, and their repositories will automatically migrate from LFS. Soon, the entire Xet protocol and infrastructure stack will be open-sourced, setting a new benchmark for scalable AI infrastructure.

💡 What Undercode Say:

Revolutionizing Storage for the AI Era

Hugging Face’s migration to Xet is more than just a backend swap—it’s a paradigm shift in how AI data is stored, accessed, and shared. The transition reflects a clear understanding of the demands of modern AI workloads, which are increasingly defined by massive data sets, frequent access, and fast iteration cycles.

By adopting chunk-based transfers, Xet optimizes both bandwidth and latency. The Content Addressed Store model minimizes redundancy, saving storage while accelerating downloads. It’s a smart architecture designed for tomorrow’s AI pipelines.

From an engineering standpoint, the elegance of the Git LFS Bridge ensures that legacy users aren’t left behind. It acts as a compatibility layer, quietly intercepting and transforming requests to mimic the LFS experience while moving data to the new system in the background.

What’s especially remarkable is the scalability and fault tolerance. During bartowski’s 500TB migration, Hugging Face uncovered and patched real-world bottlenecks like /tmp shard file handling and EBS throughput. They adapted fast, and those lessons became blueprints for even larger migrations.

Moreover, Hugging Face showed that seamless migrations at petabyte scale are not just possible—they’re repeatable. With the likes of RichardErkhov and mradermacher moving over 7.8 PB collectively, Xet proved its robustness under pressure.

The migration also signals a broader trend toward content-aware, distributed storage solutions in AI. Tools like xet-core and hf-xet reflect an industry moving away from monolithic repositories toward granular, reproducible, and high-speed data systems.

And from a community perspective, the rollout couldn’t be more user-friendly. No hard deadlines. No broken workflows. Just faster transfers, smarter storage, and a smoother developer experience.

As open-source contributors and enterprise users embrace Xet, the Hugging Face Hub becomes a universal platform—not just for models and datasets—but for an entire AI development lifecycle powered by resilient infrastructure.

✅ Fact Checker Results 🧐

Hugging Face has successfully migrated 500,000+ repos and 20 PB to Xet.

CAS throughput increased nearly 10x, enabling high-speed data access.

Git LFS Bridge maintains full backward compatibility with legacy clients.

🔮 Prediction: What Comes Next? 🚀

With the infrastructure now proven at scale, Hugging Face is poised to fully retire Git LFS in the long term. Expect the Xet protocol to become industry standard for large-scale ML data storage. Open-sourcing the full Xet stack will likely accelerate adoption beyond Hugging Face, making it a staple in AI ecosystems. Chunk-based storage, seamless client integration, and cloud-native scaling are no longer optional—they’re the future of AI infrastructure.

References:

Reported By: huggingface.co
Extra Source Hub:
https://www.quora.com
Wikipedia
OpenAi & Undercode AI

Image Source:

Unsplash
Undercode AI DI v2

🔐JOIN OUR CYBER WORLD [ CVE News • HackMonitor • UndercodeNews ]

💬 Whatsapp | 💬 Telegram

📢 Follow UndercodeNews & Stay Tuned:

𝕏 formerly Twitter 🐦 | @ Threads | 🔗 Linkedin