natural language processing

The goal of natural language processing (NLP) is to build machines to process natural language, which is to say perform tasks on human language. Common NLP tasks include machine translation, sentiment analysis, question answering, summarization, tagging, dialog (e.g., Alexa), etc. The three pillars of NLP include probabilistic [[language model|language models]], [[base/Deep Learning/deep learning|deep learning]], and [[word embeddings]]. Applications of NLP include - text classification: sentiment analysis, spam detection, topic prediction, author identification, resume screening, language identification, deception detection. [[Computational linguistics]] parallels NLP and focuses on providing insight into human language. [[algorithm|Algorithms]] commonly used in NLP include k-nearest neighbors, hidden Markov models, logistic regression, feed-forward networks, recurrent neural networks, and others. [[semantics]] ## sequence to sequence tasks Sequence-to-sequence tasks are those that require translating one sequence to another sequence. Neither the vocabulary nor the length of the sequence need to be the same between the input and output sequences. Chatting with a [[large language model]] is a common example of a sequence-to-sequence task in action, where the input is a sequence of tokens and the output is also sequence of tokens. Other examples include machine translation, summarization, style transfer, grammatical error correction, morphological inflection, and many others. [[machine translation]] ## beam search Rather than selecting the most probable token at each time step, beam search keeps the top $k$ most probable sequences (called beams) at each decoding step instead of just the best one. ## libraries - spacy - [Stanford CoreNLP](https://github.com/stanfordnlp/CoreNLP) > [!Tip]- Additional Resources > The [von der Wense Natural Language Processing Group (NALA)](https://nala-cub.github.io/) at CU Boulder