HuggingFace (🤗) offers models (1M+ as of last count), datasets (like [[Kaggle]]), and [[HuggingFace Spaces]] to build and deploy apps. HuggingFace has libraries `hub` for models, `datasets` for datasets, `transformers` for custom model architectures, `peft` for finetuning, `trl` for reinforcement learning, and `accelerate` for hardware optimization. To use HuggingFace, first create a HuggingFace account (and confirm your email address). Then create a new token (API key) of type "Write" under Access Tokens. Store the key in your [[environment file]]. Don't worry about the fine-grained permissions, unless you know what you're doing; just select the "Write" tab. ## inference endpoint HuggingFace makes it very easy to run inference on the cloud by deploying a model. There is a cost associated but it can be a good solution for short term deployments. ```python client = InferenceClient(URL, token=hf_token) client.text_completion(message) ``` ## transformer ```bash mamba install transformers ``` ## pipelines The `pipeline` module provides a very simple interface for [[inference]] and common [[NLP]] tasks. Provide `model=` to specify a model, otherwise HuggingFace will select the default model for that task. If using a GPU, also supply `device='cuda'`. ```python from huggingface_hub import login from transformers import pipeline # Load API key in Colab from google.colab import userdata hf_token = userdata.get('HF_TOKEN') login(hf_token, add_to_git_credentials=True) # Sentiment Analysis classifier = pipeline("sentiment-analysis") # device="cuda" if GPU result = classifier("text to classify") # Named Entity Recognition ner = pipeline("ner", grouped_entities=True) result = ner("text with entities") # Question Answering with Context question_answerer = pipelin("question-answering") result = question_answerer("question") # Text Summarization summarizer = pipeline("summarization") text = """Text to summarize""" summary = summarizer(text, max_length=50, min_length=25, do_sample=False) print(result[0]['summary_text']) ## Translation English to Spanish translator = pipeline("translation_en_to_sp") result = translator("Text to translate") print(result[0]['tranlsation_text']) ## Classification classifier = pipeline("zero-shot-classification") result = classifer( "Text to classify", candidate_labels=["technology", "sports", "politics"] ) ## Text Generation generator = pipeline("text-generation") result = generator("Beginning of text") print(result[0]['generated_text']) ``` ## models [[HuggingFace]] uses [[PyTorch]] under the hood to run models. `CausalLM` includes all auto-regressive models. ```python model = AutoModelForCausalLM.from_pretrained(model, device_map="auto") # Set input messages = [ {"role": "system", "content": "You are a helpful assistant"}, {"role": "user", "content": "Tell a light-hearted joke for a room of Data Scientists"} ] inputs = tokenizer.apply_chat_template(messages) # Get output output = model.generate(inputs, max_new_tokens=100) ``` Optionally, provide use [[quantization]] to reduce the precision of model weights. To clean up after loading a model to free up space on the GPU use ```python del inputs, output, model torch.cuda.empty_cache() ``` > [!Tip]- Additional Resources > - [Learn LLMs on Hugging Face](https://huggingface.co/learn)