Welcome to the Hugging-Verse
@stevhliu|Mar 10, 2025 (9 months ago)5 views
I often see people ask, what is Hugging Face? The most common answer is usually some variant of "Hugging Face is the GitHub of machine learning".
It is a good one-liner considering the breadth of Hugging Face's quest to enable collaborative machine learning (ML). But it also hides the depth of what we do. The Hugging Face ecosystem, or Hugging-Verse, is quite expansive and it can often feel like we're doing too much without a strong focus.
But I believe the Hugging-Verse needs to be expansive precisely because we're pursuing such an ambitious quest.
Image generated from the linoyts/huggy_flux_fal_lora checkpoint.Think about games like Baldur's Gate 3 or Elden Ring. They're enormous games you can easily spend 100+ hours on in a single playthrough. Alongside the main quest, there are many side quests that add a ton of richness, worldbuilding, and depth.
Similarly, the main Hugging Face libraries and Hub platform advance the main quest. All the other libraries, research projects, courses, tools and services are side quests that enrich the Hugging-Verse.
It can be overwhelming if you're new though, so here is my high-level walkthrough.
#libraries
The Hugging-Verse contains many libraries related to nearly every aspect of ML.
The main ones, like Transformers and Diffusers, provide access to pretrained models for inference and fine-tuning. They're able to do a little bit of everything.
Side libraries focus on specific ML topics such as:
- distributed training (Accelerate, picotron, nanotron)
- evaluation (Lighteval)
- on-device (Transformers.js, Optimum)
Where you start in the Hugging-Verse depends on what you're trying to do. There aren't defined progression paths.
Two general routes I think most people take are:
- Start learning ML with Transformers.
- Adapting a side library for their own work or research.
My recommendation is to pick 1 if you're a beginner. Otherwise, feel free to create your own progression path.
#hub
The Hub is the most visible part of Hugging Face. This is probably what most people think of when they think of Hugging Face.
It is a Git-based platform with to access or share models, datasets, and Spaces. Every public repository provides free storage for your ML artifacts. Storage is based on Xet, a more efficient storage system. Social features like posts, articles, and the Community tab encourage collaboration and discussion.
Model repositories include a widget to run inference with a model on your browser. The widget is powered by Inference Providers like Replicate and fal.
Dataset repositories have a Dataset Viewer for easily previewing a datasets contents. You can run SQL queries (or ask AI to craft a SQL query for you) on the dataset with the Data Studio feature to explore it in more detail. Again, this all takes place in the browser.
You can now query datasets with natural language 🗣️ ▶︎ Powered by Deepseek V3 🐳 via Inference Providers ▶︎ Use SQL to query datasets entirely in the browser via @duckdb. You'll find the best performance for smaller datasets
Spaces is a scaffold to create and deploy ML apps with Gradio, Docker, React, Svelte, and more. A free tier Space runs on a basic 16GB CPU, but you can upgrade to more powerful GPUs and persistent storage if your app needs it. PRO subscribers have access to ZeroGPU, a shared cluster of A100 GPUs that are automatically allocated to a Space to complete a workload and then released to the next Space.

#huggingchat
HuggingChat is an open version of ChatGPT, providing access to some of the latest models like DeepSeek-R1. It uses a router, Omni, to automatically select the best route and model for a message. The system prompt of each model is modifiable.
HuggingChat is powered by Inference Providers, but you can use a different provider like OpenRouter.
#research
Hugging Face is also engaged in research projects that empower the entire ML ecosystem. The goal isn't necessarily to train the best models. Instead, we're trying to create new and interesting research that benefits everyone.
Collaboration >>> competition.
As an example, the The Ultra-Scale Playbook is a culmination of everything the research team has learned about distributed training. It is freely available to anyone who is interested in scaling training of large language models to thousands of GPUs.
After 6+ months in the making and burning over a year of GPU compute time, we're super excited to finally release the "Ultra-Scale Playbook" Check it out here: hf.co/spaces/nanotro… A free, open-source, book to learn everything about 5D parallelism, ZeRO, fast CUDA kernels,
Some other research projects include:
- open-r1: fully reproduce the DeepSeek-R1 model, an open and powerful reasoning model comparable to the OpenAI o-series models.
- SmolLM: release high-quality pretraining datasets (FineWeb, Cosmopedia) and small language and vision language models.
These types of projects enable other researchers to build on top of our work and push the field forward together.
#courses
Educational content, such as the courses and cookbook, at hf.co/learn provide an accessible starting point for learning ML. A large part of our quest hinges on lowering the barrier for everyone to learn ML and convert them into active participants.
There are several courses - agents, reinforcement learning, diffusion, etc. - available on hf.co/learn. You can also find more courses, such as quantization fundamentals, we collaborated on with other learning platforms like DeepLearning.AI.