Create word cloud using Python
A Wordcloud (or Tag cloud) is a visual representation of text information. It shows a list of words, the significance of each being appeared with text dimension or shading. This configuration is valuable for rapidly seeing the most conspicuous terms. It is generally used to depict keyword metadata (tags) on websites, analyzing customer, employee feedback or to envision free frame text. Tags are usually single words, and the importance of each tag is shown with font size or color. You have definitely seen various types of basic and advance charts for data visualization, like scatter plots, histogram, pie charts and so on. A Word cloud is very interesting data visualization.
Python provides wordcloud module to create word cloud and tag cloud. This module depends on numpy and pillow.
Open your terminal window and run the following command to install wordcloud using pip tool -
pip install wordcloud
The above command not only install the wordcloud but also install all supporting modules like matplotlib, pillow, numpy, cycler.
Data for Wordcloud
We need data for wordcloud, the data can be any CSV, Excel file, or text scrap content. In this article, we have taken data via Wikipedia text scraping. This requires the installation of wikipedia module -
pip install wikipedia
In the given code, we have scrapped text for the 'Data Science' on Wikipedia and stored in a string variable. Further, we will use this text to generate and display the word cloud.
import wikipedia import re data = wikipedia.page("Data_science") text = data.content # clean data text = re.sub(r'==.*?==+', '', text) final_text = text.replace('\n', '') print(final_text)
Python Wordcloud Example
Here is very few lines of Python script to generate and save word cloud image. It helps us to better understand our data whenever we work on some Data Science project. Stopwords are the words which does not have any meaning like 'I', 'we', 'are', 'is', 'am' and many more. The wordcloud already eliminates the most widely recognized stopwords for us. Yet, on the off chance that you imagine that your content corpus has some stop words that are not eliminated, you can generally affix new words that will be taken out.
The WordCloud method has attributes for different configuration settings, like width, height, background color, fonts. To demonstrate how easy it is, we have set the background color and limited the maximum words.
import wikipedia import re from wordcloud import WordCloud, STOPWORDS import numpy as npy from PIL import Image data = wikipedia.page("Data_science") text = data.content # clean data text = re.sub(r'==.*?==+', '', text) final_text = text.replace('\n', '') # define function to create word cloud def word_cloud(data): # open image array_mask = npy.array(Image.open("cloud.png")) # set image configurations and generate cloud cloud = WordCloud(background_color = "white", max_words = 200, mask = array_mask, stopwords = set(STOPWORDS)) cloud.generate(data) # save wordcloud to a file cloud.to_file("wcloud.png") word_cloud(final_text)
The mask image is in the left and the generated output from the above code is on the right -
Related ArticlesVader Sentiment Analysis Python
Python YouTube Downloader Script
Python project ideas for beginners
Pandas string to datetime
Fillna Pandas Example
How to generate QR Code in Python using PyQRCode
OpenCV and OCR Python
PHP code to send SMS to mobile from website
Fibonacci Series Program in Python
Python File Handler - Create, Read, Write, Access, Lock File
Python convert XML to JSON
Python convert xml to dict
Python convert dict to xml