Named-entity recognition (NER), also known as token classification or text tagging, is the task of taking a sentence and classifying every word (or “token”) into different categories, such as names of people or names of locations, or different parts of speech.
For example, given the sentence:
Does Chicago have any Pakistani restaurants?
A named-entity recognition algorithm may identify:
and so on.
Using gradio
(specifically the HighlightedText
component), you can easily build a web demo of your NER model and share that with the rest of your team.
Here is an example of a demo that you’ll be able to build:
This tutorial will show how to take a pretrained NER model and deploy it with a Gradio interface. We will show two different ways to use the HighlightedText
component — depending on your NER model, either of these two ways may be easier to learn!
Make sure you have the gradio
Python package already installed. You will also need a pretrained named-entity recognition model. You can use your own, while in this tutorial, we will use one from the transformers
library.
Many named-entity recognition models output a list of dictionaries. Each dictionary consists of an entity, a “start” index, and an “end” index. This is, for example, how NER models in the transformers
library operate:
from transformers import pipeline
ner_pipeline = pipeline("ner")
ner_pipeline("Does Chicago have any Pakistani restaurants")
Output:
[{'entity': 'I-LOC',
'score': 0.9988978,
'index': 2,
'word': 'Chicago',
'start': 5,
'end': 12},
{'entity': 'I-MISC',
'score': 0.9958592,
'index': 5,
'word': 'Pakistani',
'start': 22,
'end': 31}]
If you have such a model, it is very easy to hook it up to Gradio’s HighlightedText
component. All you need to do is pass in this list of entities, along with the original text to the model, together as dictionary, with the keys being "entities"
and "text"
respectively.
Here is a complete example:
from transformers import pipeline
import gradio as gr
ner_pipeline = pipeline("ner")
examples = [
"Does Chicago have any stores and does Joe live here?",
]
def ner(text):
output = ner_pipeline(text)
return {"text": text, "entities": output}
demo = gr.Interface(ner,
gr.Textbox(placeholder="Enter sentence here..."),
gr.HighlightedText(),
examples=examples)
demo.launch()
An alternative way to pass data into the HighlightedText
component is a list of tuples. The first element of each tuple should be the word or words that are being classified into a particular entity. The second element should be the entity label (or None
if they should be unlabeled). The HighlightedText
component automatically strings together the words and labels to display the entities.
In some cases, this can be easier than the first approach. Here is a demo showing this approach using Spacy’s parts-of-speech tagger:
import gradio as gr
import os
os.system('python -m spacy download en_core_web_sm')
import spacy
from spacy import displacy
nlp = spacy.load("en_core_web_sm")
def text_analysis(text):
doc = nlp(text)
html = displacy.render(doc, style="dep", page=True)
html = (
"<div style='max-width:100%; max-height:360px; overflow:auto'>"
+ html
+ "</div>"
)
pos_count = {
"char_count": len(text),
"token_count": 0,
}
pos_tokens = []
for token in doc:
pos_tokens.extend([(token.text, token.pos_), (" ", None)])
return pos_tokens, pos_count, html
demo = gr.Interface(
text_analysis,
gr.Textbox(placeholder="Enter sentence here..."),
["highlight", "json", "html"],
examples=[
["What a beautiful morning for a walk!"],
["It was the best of times, it was the worst of times."],
],
)
demo.launch()
And you’re done! That’s all you need to know to build a web-based GUI for your NER model.
Fun tip: you can share your NER demo instantly with others simply by setting share=True
in launch()
.