Lesk Algorithm In Python To Remove Word Ambiguity

In this article, we will see how to use Python code to remove word ambiguity using the Lesk algorithm.
 
For example, in the sentences below, the word “bank” has different meanings based on the context of the sentence.
 
Text1 = 'I went to the bank to deposit my money'
Text2 = 'The river bank was full of dead fishes'
 
The Lesk algorithm is the seminal dictionary-based method.
 
This is the definition from Wikipedia: "It is based on the hypothesis that words used together in text are related to each other and that the relation can be observed in the definitions of the words and their senses. Two (or more) words are disambiguated by finding the pair of dictionary senses with the greatest word overlap in their dictionary definitions. It searches for the shortest path between two words: the second word is iteratively searched among the definitions of every semantic variant of the first word, then among the definitions of every semantic variant of each word in the previous definitions and so on.Finally, the first word is disambiguated by selecting the semantic variant which minimizes the distance from the first to the second word."
 
Basically, the context is chosen from meaning of the nearest words. Following is the simplified pictorial representation of the same...

 
Let's see the code to implement the Lesk algorithm in Python.
 
First install the library pywsd - python implementation of Word Sense Disambiguation (WSD)
  1. #Install pywsd  
  2. pip install pywsd  
  3.    
  4. #Import functions  
  5. from pywsd.lesk import simple_lesk  
  6. sentences = ['I went to the bank to deposit my money',  
  7. 'The river bank was full of dead fishes']  
  8. # calling the lesk function and printing results for both the sentences  
  9. print ("Context-1:", sentences[0])  
  10. answer = simple_lesk(sentences[0],'bank')  
  11. print ("Sense:", answer)  
  12. print ("Definition : ", answer.definition())  
Result -
 
Context-1
 
I went to the bank to deposit my money
 
Sense - Synset ('depository_financial_institution.n.01')
 
Definition - a financial institution that accepts deposits and channels the money into lending activities 
  1. print ("Context-2:", sentences[1])  
  2. answer = simple_lesk(sentences[1],'bank')  
  3. print ("Sense:", answer)  
  4. print ("Definition : ", answer.definition())  
Context-2
 
The river bank was full of dead fishes
 
Sense - Synset ('bank.n.01')
 
Definition - sloping land (especially the slope beside a body of water)
 
Observe that in context-1, “bank” is a financial institution, but in  context-2, “bank” is sloping land.
 
Another example,
 
new_sentences = ['The workers at the plant were overworked',
 
'The plant was no longer bearing flowers',
 
'The workers at the industrial plant were overworked']
  1. # calling the lesk function and printing results  
  2. print ("Context-1:", new_sentences[0])  
  3. answer = simple_lesk(new_sentences[0],'plant')  
  4. print ("Sense:", answer)  
  5. print ("Definition : ", answer.definition())   
Result -- not exactly as expected
 
Context-1
 
The workers at the plant were overworked Sense: Synset('plant.v.06') Definition : put firmly in the mind
  1. print ("Context-2:", new_sentences[1])  
  2. answer = simple_lesk(new_sentences[1],'plant')  
  3. print ("Sense:", answer)  
  4. print ("Definition : ", answer.definition())  
Result -- as expected 
 
Context-2
 
The plant was no longer bearing flowers Sense: Synset('plant.v.01') Definition : put or set (seeds, seedlings, or plants) into the ground  
  1. print ("Context-3:", new_sentences[2])  
  2. answer = simple_lesk(new_sentences[2],'plant')  
  3. print ("Sense:", answer)  
  4. print ("Definition : ", answer.definition())  
Result -- as expected. One extra word can make a difference in context.
 
Context-2
 
The workers at the industrial plant were overworked Sense: Synset('plant.n.01') Definition : buildings for carrying on industrial labor 
 
Simple Lesk is somewhere in between using more than the original Lesk algorithm (1986) and using fewer signature words than adapted Lesk (Banerjee and Pederson, 2002)


Similar Articles