Steven Liu

Welcome to the Hugging-Verse

Mar 10, 2025 (9 months ago)5 views

I often see people ask, what is Hugging Face? The most common answer is usually some variant of "Hugging Face is the GitHub of machine learning".

It is a good one-liner considering the breadth of Hugging Face's quest to enable collaborative machine learning (ML). But it also hides the depth of what we do. The Hugging Face ecosystem, or Hugging-Verse, is quite expansive and it can often feel like we're doing too much without a strong focus.

But I believe the Hugging-Verse needs to be expansive precisely because we're pursuing such an ambitious quest.

The Hugging-VerseImage generated from the linoyts/huggy_flux_fal_lora checkpoint.

Think about games like Baldur's Gate 3 or Elden Ring. They're enormous games you can easily spend 100+ hours on in a single playthrough. Alongside the main quest, there are many side quests that add a ton of richness, worldbuilding, and depth.

Similarly, the main Hugging Face libraries and Hub platform advance the main quest. All the other libraries, research projects, courses, tools and services are side quests that enrich the Hugging-Verse.

It can be overwhelming if you're new though, so here is my high-level walkthrough.

If you only take one thing away, remember, side quests are optional. Focus on a main library like Transformers and using the Hub at first if you aren't sure where to start.

libraries

The Hugging-Verse contains many libraries related to nearly every aspect of ML.

The main ones, like Transformers and Diffusers, provide access to pretrained models for inference and fine-tuning. They're able to do a little bit of everything.

Side libraries focus on specific ML topics such as:

Where you start in the Hugging-Verse depends on what you're trying to do. There aren't defined progression paths.

Two general routes I think most people take are:

  1. Start learning ML with Transformers.
  2. Adapting a side library for their own work or research.

My recommendation is to pick 1 if you're a beginner. Otherwise, feel free to create your own progression path.

hub

The Hub is the most visible part of Hugging Face. This is probably what most people think of when they think of Hugging Face.

It is a Git-based platform with to access or share models, datasets, and Spaces. Every public repository provides free storage for your ML artifacts. Storage is based on Xet, a more efficient storage system. Social features like posts, articles, and the Community tab encourage collaboration and discussion.

Model repositories include a widget to run inference with a model on your browser. The widget is powered by Inference Providers like Replicate and fal.

Dataset repositories have a Dataset Viewer for easily previewing a datasets contents. You can run SQL queries (or ask AI to craft a SQL query for you) on the dataset with the Data Studio feature to explore it in more detail. Again, this all takes place in the browser.

Spaces is a scaffold to create and deploy ML apps with Gradio, Docker, React, Svelte, and more. A free tier Space runs on a basic 16GB CPU, but you can upgrade to more powerful GPUs and persistent storage if your app needs it. PRO subscribers have access to ZeroGPU, a shared cluster of A100 GPUs that are automatically allocated to a Space to complete a workload and then released to the next Space.

ZeroGPU

huggingchat

HuggingChat is an open version of ChatGPT, providing access to some of the latest models like DeepSeek-R1. It uses a router, Omni, to automatically select the best route and model for a message. The system prompt of each model is modifiable.

HuggingChat is powered by Inference Providers, but you can use a different provider like OpenRouter.

research

Hugging Face is also engaged in research projects that empower the entire ML ecosystem. The goal isn't necessarily to train the best models. Instead, we're trying to create new and interesting research that benefits everyone.

Collaboration >>> competition.

As an example, the The Ultra-Scale Playbook is a culmination of everything the research team has learned about distributed training. It is freely available to anyone who is interested in scaling training of large language models to thousands of GPUs.

Some other research projects include:

These types of projects enable other researchers to build on top of our work and push the field forward together.

courses

Educational content, such as the courses and cookbook, at hf.co/learn provide an accessible starting point for learning ML. A large part of our quest hinges on lowering the barrier for everyone to learn ML and convert them into active participants.

There are several courses - agents, reinforcement learning, diffusion, etc. - available on hf.co/learn. You can also find more courses, such as quantization fundamentals, we collaborated on with other learning platforms like DeepLearning.AI.