How You Can Easily Perform NLP Tasks With Hugging Face Pipeline
The Complete Guide to Learn Hugging Face Pipelines for Beginners
In this ever-dynamic field of Artificial Intelligence and Natural Language Processing, I always find it tough to know and stay updated with the new tools and frameworks being developed.
One such framework catering to this void and becoming the star within the AI community is Hugging Face, and it is this ease of use and powerful capabilities that have vouched for it among developers.
In this blog post, we will be learning in-depth about the Hugging Face pipeline and how it makes it easy to use pre-trained models for various NLP tasks.
Hugging Face
Hugging Face is one of those open-source libraries that revolutionized the work of every developer and researcher in NLP.
It gives you access to an extensive range of pre-trained models and tools through a simple, well-documented, and coherent API—using just a few lines of code to harness the full power of complex NLP pipelines.
And what makes this super-easy is the Hugging Face pipeline itself.
This makes the pipeline function the simplest way to leverage a pre-trained model for a given task.
It abstracts many of the sophistications that one has to go through in setting up these models and using them, making state-of-the-art NLP models easily usable by even people with only limited machine learning experience.
Supported Tasks
The Hugging Face pipeline presents support for the following common NLP tasks out of the box:
Text Processing:
Sentiment Analysis
Text Classification
Named Entity Recognition
Question Answering
Text Generation
Text Summarization
Translation
Image Processing:
Image Classification
Object Detection
Image Segmentation
Audio Processing:
Audio Classification
Speech Recognition
This flexibility of the pipeline function makes it a very handy tool for applications in different domains.
Getting Started with Hugging Face Pipeline
Before you start using the Hugging Face pipeline, you will have to set your environment. The following steps can guide you in doing so:
Install the Required Libraries
pip install transformers datasets
Import the Pipeline Function
from transformers import pipeline
Create a Classifier by Specifying the Task
classifier = pipeline("sentiment-analysis")
That's it! You now have a pre-trained model ready to perform sentiment analysis on any text you provide.
Sentiment Analysis: A Practical Example
Now, let's see how the Hugging Face pipeline can be used for sentiment analysis in practice.
In this case, the classification task will be identifying whether a given text expresses a positive sentiment or a negative sentiment.
# Create the sentiment analysis classifier
classifier = pipeline("sentiment-analysis")
# Analyze a single sentence
result = classifier("Thanks a lot for watching the video. Really appreciate it!")
print(result)
# Analyze multiple sentences
texts = [
"Thanks a lot guys!",
"This video is not cool.",
"This video is cool."
]
results = classifier(texts)
for result in results:
print(f"Label: {result['label']}, Score: {result['score']:.4f}")
Running this code, you will see that the model has correctly identified the sentiment of each sentence and has given a label (POSITIVE or NEGATIVE) and a confidence score for both.
Understand the Results
Let's dissect the results:
"Thanks a lot for watching the video. Really appreciate it!"
Model classifies it as POSITIVE with a high amount of confidence, due to the expressions "thanks," "appreciate," and generally the tone.
"Thanks a lot guys!"
Again, classified POSITIVE with high confidence. Gratitude expresses positive sentiment.
"This video is not cool."
Classified NEGATIVE with high confidence.
In this example, "not" precedes a positive word ("cool"); the negation flips the sentiment.
"This video is cool."
Classified POSITIVE with high confidence.
The removal of the negation returns the sentiment to its previous value—positive.
For instance, in the following example, we demonstrate how the model can capture negations, which completely change the meaning of a sentence.
The Process in the Backstage
Despite the pipeline function making it all look magical, there's a load of things happening backstage:
Model Download: When you create a pipeline, it automatically downloads a default pre-trained model suitable for the specified task.
It also downloads and caches a tokenizer. This tokenizer will take your input text and convert it into a format the model understands—usually breaking it down into smaller units called tokens.
Finally, the pipeline passes these tokens through the model to generate an output, which it interprets and returns in a user-friendly format.
All of this happens under the hood, and you're left to bear all the other more exciting things about a model rather than the technical details about how it might be implemented.
Keeping Sentiment Analysis Aside
While we've been preparing our example as a sentiment task, the Hugging Face pipeline can do so much more.
Here are some of the other things you'd like to try:
Text Generation
generator = pipeline("text-generation")
result = generator("In a world where AI", max_length=50, num_return_sequences=1)
print(result[0]['generated_text'])
This might lead to a creative answer to the provided question prompt.
Translation
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-fr")
result = translator("Hello, how are you?")
print(result[0]['translation_text'])
This would translate the English text into French.
Text Summarization
summarizer = pipeline("summarization")
long_text = "..." # Your long text here
summary = summarizer(long_text, max_length=130, min_length=30, do_sample=False)
print(summary[0]['summary_text'])
This would summarize a long article text into a brief one
You Should Do This (Best Practices):
1. Select the right model:
While most use cases should work fine with the default model, one should look into the HuggingFace model hub to see if there are any task-specific models available which may work better for one's special use case.
2. Handle Large Inputs:
Some models have a maximum input length. For large texts, consider breaking them into smaller parts for analysis.
3. Play with the Parameters:
Many pipeline tasks provide you the ability to tune their parameters, such as max_length or num_return_sequences. Play with these settings to fine-tune your results.
4. Watch for Biases:
Pre-trained models have tendencies to inherit biases from their training data. So, critically evaluate the output; this becomes pertinent in sensitive applications.
5. Think About Fine-tuning:
For domain-specific tasks, you may get better results by fine-tuning a pre-trained model on your dataset.
Conclusion
Hugging Face pipeline is a big step for bringing state-of-the-art NLP to a broader audience. Its simplicity abandons no strength, bringing advanced performance while only writing a few lines of code.
We have seen how a task requiring sound machine-learning knowledge and heavy setups became just a few lines of code. This AI technology democratization gives very interesting opportunities for developers, researchers, and businesses.
Whether it is a sentiment analysis tool for customer feedback, a language translation service, or some creative text generation, the Hugging Face pipeline gives a great foundation on which further work can be built.
Lastly, take note that the pipeline is only scratching the surface as you move further along your NLP path. Hugging Face has a lot more to offer in its pool of resources, ranging from more advanced APIs which let you take closer control over your models to everything else.
So what are you waiting for? experiment, and see the potential that NLP has for you in your projects. It is a whole universe of natural language processing available to you.
Connect: LinkedIn | Gumroad Shop | Medium | GitHub
Subscribe: Substack Newsletter | Appreciation Tip: Support
Best Selling eBooks: