March 26, 2025

ikayaniaamirshahzad@gmail.com

Hugging Face is Replacing Git LFS With Xet Storage: Here’s Why

Hugging Face is one of the more prominent platforms for hosting AI models. Whether an AI model is developed by ByteDance, Google, or a startup, one is likely to find it listed there.

In 2024, Hugging Face acquired XetHub, a Seattle-based company which was a platform to build and deploy GenAI applications, to utilise its technologies to switch to a better version of Git Large File Storage (LFS) as a storage backend for its Hub’s repositories.

Fast-forward to 2025, and Hugging Face has begun migrating its first model and dataset repositories from LFS and to Xet storage.

The Limitations of Git LFS Storage for AI Repositories

Git LFS is an open source Git extension for versioning large files. It replaces files like audio, videos, datasets, and graphics with text pointers inside Git, while storing the file separately on a remote server.

At the time of writing this report, Hugging Face still utilises the technology in combination with Amazon S3, a cloud storage service, for remote storage. By September 20, 2024, the total amount of files hosted by Hugging Face reached an impressive 29 petabytes.

However, the company clarified that the repositories on the Hugging Face Hub differ from those on software development platforms. It acknowledges that while LFS was designed for large files, the type of files in AI is significantly larger. As a result, the company always planned to transition to its own optimised storage and versioning backend at some point.

“LFS deduplicates at the file level. Even tiny edits create a new revision to upload in full; painful for the multi-gigabyte files found in many Hub repositories,” Hugging Face explained in a blog post.

Introducing Xet Storage and its Use Cases to Hugging Face

To overcome the limitations of Git LFS highlighted above, Hugging Face started to implement Xet storage.

“When a file backed by Xet storage is updated, only the modified data is uploaded to remote storage, significantly saving on network transfers. For many workflows, like incremental updates to model checkpoints or appending/inserting new data into a dataset, this improves iteration speed for yourself and your collaborators,” Hugging Face pointed out.

Xet storage uses content-defined chunking (CDC) to deduplicate at the level of bytes. When a small piece of metadata in a GGUF model is modified, only the altered chunks are transmitted. Furthermore, a rolling hash algorithm is employed to compute the chunks. Xet storage also offers backwards compatibility with Git LFS.

With these technical merits, Hugging Face recognised future use cases where users wouldn’t need to re-upload a 10 GB data storage file after adding a single row. Instead, they could simply re-upload the few chunks that have changed, including the new row.

The company shared an example where an Xet-backed version of gemma-2-9b-it-GGUF repository totalled 97 GB, saving approximately 94 GB instead of 191 GB, which was the original size of the model. It hints at nearly 50% savings in storage, which should make it easier for everyone to download it.

Hugging Face’s Migration Success

On March 18, Hugging Face shared a proof-of-concept for its first step of migration of repositories.

They stated that the migration shifted approximately 6% of the Hub’s download traffic to its Xet infrastructure. In the process, Hugging Face transferred all target repositories for 4.5 TB into Xet storage.

While they faced challenges like unexpected load imbalance and download overhead (as shown in the image above) on their storage system, the initial migration was successful, and Xet is now on the Hugging Face Hub.

Users of the Hugging Face platform can experience the benefits of it with less waiting on uploads or downloads and faster iterations on big files.

The company encourages upgrading to hf_xet to get the benefits, though the legacy clients will be compatible via the LFS Bridge.

Source link

Hugging Face is Replacing Git LFS With Xet Storage: Here’s Why

The Limitations of Git LFS Storage for AI Repositories

Introducing Xet Storage and its Use Cases to Hugging Face

Hugging Face’s Migration Success

Latest articles

First Trial of Generative AI Therapy Shows It Might Help With Depression

The CDC buried a measles forecast that stressed the need for vaccinations

Move fast, kill things: the tech startups trying to reinvent defence with Silicon Valley values | Drones (military)

Leave a Comment Cancel reply

First Trial of Generative AI Therapy Shows It Might Help With Depression

The CDC buried a measles forecast that stressed the need for vaccinations

Move fast, kill things: the tech startups trying to reinvent defence with Silicon Valley values | Drones (military)

Hugging Face is Replacing Git LFS With Xet Storage: Here’s Why

The Limitations of Git LFS Storage for AI Repositories

Introducing Xet Storage and its Use Cases to Hugging Face

Hugging Face’s Migration Success

Latest articles

First Trial of Generative AI Therapy Shows It Might Help With Depression

The CDC buried a measles forecast that stressed the need for vaccinations

Move fast, kill things: the tech startups trying to reinvent defence with Silicon Valley values | Drones (military)

Leave a Comment Cancel reply

Featured articles

First Trial of Generative AI Therapy Shows It Might Help With Depression

The CDC buried a measles forecast that stressed the need for vaccinations

Move fast, kill things: the tech startups trying to reinvent defence with Silicon Valley values | Drones (military)