HuggingFace (🤗) offers models (1M+ as of last count), datasets (like [[Kaggle]]), and [[HuggingFace Spaces]] to build and deploy apps.
HuggingFace has libraries `hub` for models, `datasets` for datasets, `transformers` for custom model architectures, `peft` for finetuning, `trl` for reinforcement learning, and `accelerate` for hardware optimization.
To use HuggingFace, first create a HuggingFace account (and confirm your email address). Then create a new token (API key) of type "Write" under Access Tokens. Store the key in your [[environment file]]. Don't worry about the fine-grained permissions, unless you know what you're doing; just select the "Write" tab.
## inference endpoint
HuggingFace makes it very easy to run inference on the cloud by deploying a model. There is a cost associated but it can be a good solution for short term deployments.
```python
client = InferenceClient(URL, token=hf_token)
client.text_completion(message)
```
## transformer
```bash
mamba install transformers
```
## pipelines
The `pipeline` module provides a very simple interface for [[inference]] and common [[NLP]] tasks. Provide `model=` to specify a model, otherwise HuggingFace will select the default model for that task. If using a GPU, also supply `device='cuda'`.
```python
from huggingface_hub import login
from transformers import pipeline
# Load API key in Colab
from google.colab import userdata
hf_token = userdata.get('HF_TOKEN')
login(hf_token, add_to_git_credentials=True)
# Sentiment Analysis
classifier = pipeline("sentiment-analysis") # device="cuda" if GPU
result = classifier("text to classify")
# Named Entity Recognition
ner = pipeline("ner", grouped_entities=True)
result = ner("text with entities")
# Question Answering with Context
question_answerer = pipelin("question-answering")
result = question_answerer("question")
# Text Summarization
summarizer = pipeline("summarization")
text = """Text to summarize"""
summary = summarizer(text, max_length=50, min_length=25, do_sample=False)
print(result[0]['summary_text'])
## Translation English to Spanish
translator = pipeline("translation_en_to_sp")
result = translator("Text to translate")
print(result[0]['tranlsation_text'])
## Classification
classifier = pipeline("zero-shot-classification")
result = classifer(
"Text to classify",
candidate_labels=["technology", "sports", "politics"]
)
## Text Generation
generator = pipeline("text-generation")
result = generator("Beginning of text")
print(result[0]['generated_text'])
```
## models
[[HuggingFace]] uses [[PyTorch]] under the hood to run models. `CausalLM` includes all auto-regressive models.
```python
model = AutoModelForCausalLM.from_pretrained(model, device_map="auto")
# Set input
messages = [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "Tell a light-hearted joke for a room of Data Scientists"}
]
inputs = tokenizer.apply_chat_template(messages)
# Get output
output = model.generate(inputs, max_new_tokens=100)
```
Optionally, provide use [[quantization]] to reduce the precision of model weights.
To clean up after loading a model to free up space on the GPU use
```python
del inputs, output, model
torch.cuda.empty_cache()
```
> [!Tip]- Additional Resources
> - [Learn LLMs on Hugging Face](https://huggingface.co/learn)