Introducing (NLTK)
The Natural Language Toolkit (NLTK) is a platform used for building programs for text analysis.
Installing NLTK
pip install nltk
nltk.download()
Stop Words
Sometimes we need to filter out useless data to make the data more understandable by the computer. In natural language processing (NLP), such useless data (words) are called stop words.
from nltk.corpus import stopwords
print(set(stopwords.words('Arabic')))
print(set(stopwords.words('English')))
How can we remove the stop words from our own text? The example below shows how we can perform this task:
from nltk.tokenize import sent_tokenize,word_tokenize
from nltk.corpus import stopwords
data='All work and no play'
stopword=set(stopwords.words('English'))
word=word_tokenize(data)
wordsfilter=[]
for w in word:
if w not in stopword:
wordsfilter.append(w)
print(wordsfilter)
treebank
------------------------------
Mostafa Nabieh
------------------------------
#DataandAILearning#AIandDSSkills#AIandDSSkills