GVKun编程网logo

Python:wordcloud,重复单词(python重复单词检查)

18

本文将为您提供关于Python:wordcloud,重复单词的详细介绍,我们还将为您解释python重复单词检查的相关知识,同时,我们还将为您提供关于cannotimportname''WordClo

本文将为您提供关于Python:wordcloud,重复单词的详细介绍,我们还将为您解释python重复单词检查的相关知识,同时,我们还将为您提供关于cannot import name ''WordCloud'' from ''wordcloud''、GGWordcloud 带渐变色/透明字(带adjustcolor 的GGPlot Wordcloud 渐变)、Python + wordcloud + jieba 十分钟学会生成中文词云、Python + wordcloud 十分钟学会生成英文词云的实用信息。

本文目录一览:

Python:wordcloud,重复单词(python重复单词检查)

Python:wordcloud,重复单词(python重复单词检查)

在词云中,我有重复的词,但我不明白为什么不将它们一起计算并显示为一个词。

from wordcloud import WordCloud
word_string = 'oh oh oh oh oh oh verse wrote book stand title book would life superman thats make feel count privilege love ideal honored know feel see everyday things things say rock baby truth rock love rock rock everything need rock baby rock wanna kiss ya feel ya please ya right wanna touch ya love ya baby night reward ya things rock love rock love rock oh oh oh verse try count ways make smile id run fingers run timeless things talk sugar keeps going make wanna keep lovin strong make wanna try best give want need give whole heart little piece minimum talking everything single wish talking every dream rock baby truth rock love rock rock everything need rock baby rock wanna kiss ya feel ya please ya right wanna touch ya love ya baby night reward ya things rock love rock wanna rock bridge theres options dont want theyre worth time cause oh thank like us fine rock sand smile cry joy pain truth lies matter know count oh oh oh oh oh oh rock baby truth rock love rock rock everything need rock baby rock wanna kiss ya feel ya please ya right wanna touch ya love ya baby night reward ya things rock love rock love rock oh oh oh oh oh oh wanna kiss ya feel ya please ya right wanna touch ya love ya baby night reward ya things rock love rock wanna rock party people people party popping sitting around see looking looking see look started lets hook little one one come give stuff let freshin ruff lets go lets hook start wont stop baby baby dont stop come give stuff lets go black culture black culture black culture black culture party people people party popping sitting around see looking looking see look started lets hook come one give stuff let freshin little one one ruff lets go lets hook start wont stop baby baby dont stop come give stuff lets go black culture black culture black culture black culture lets hook come give stuff let freshin little one one ruff lets go lets hook start wont stop baby baby dont stop come give stuff lets go lets hook come give stuff let freshin little one one ruff lets go lets hook start wont stop baby baby dont stop come give stuff lets go black culture black culture black culture black culture black culture black culture black culture black culture'
wordcloud = WordCloud(background_color="white",width=1200,height=1000,stopwords=STOPWORDS
                         ).generate(word_string)
plt.imshow(wordcloud)

如您所见,爱,摇滚,黑人,文化之类的词出现了好几次,似乎它们并没有在一起计算。我究竟做错了什么?

在此处输入图片说明

cannot import name ''WordCloud'' from ''wordcloud''

cannot import name ''WordCloud'' from ''wordcloud''

首先声明一下,我是一枚python小白,今天练习关于词云的一个小例子的时候出现了小问题。

import wordcloud
#调用模块
word = wordcloud.WordCloud()
word.generate(''hello word,welcome to study python'')
word.to_file(''pic.png'')

 代码如上图,从代码上看貌似没啥问题,但是运行过后出现下图中的报错

我第一反应是打开命令提示符,输入“pip list”查询模块是否安装成功,结果列表中有wordcloud这个模块和版本号

因为只调用了一个模块,也不会存在调用顺序方面的问题,后来发现问题是出在demo的命名和模块名一致,把原来的文件名由‘wordcloud.py’改为‘wordcloud_demo.py’后程序可以正常运行,并且生成了一个简单的词云的图片。

 

GGWordcloud 带渐变色/透明字(带adjustcolor 的GGPlot Wordcloud 渐变)

GGWordcloud 带渐变色/透明字(带adjustcolor 的GGPlot Wordcloud 渐变)

如何解决GGWordcloud 带渐变色/透明字(带adjustcolor 的GGPlot Wordcloud 渐变)?

我用 ggwordcloud 创建了一个 wordcloud,因为不幸的是我不能使用替代 wordcloud 包。到目前为止,我能够根据我的要求自定义 ggwordcloud,只是遗憾的是我错过了渐变为透明的渐变的实现。到目前为止,我还没有找到允许这样做的函数。

下面的代码创建了wordcloud,但只有两种颜色,但我需要一个渐变,它越来越多地进入透明(如第二个代码示例),以便隐藏最小的单词/ 透明。

library(ggwordcloud)
data("love_words_small")
data("love_words")

set.seed(42)
ggplot(
  love_words_small,aes(
    label = word,size = speakers,color = speakers
  )
) +
  geom_text_wordcloud_area() +
  scale_size_area(max_size = 24) +
  theme_minimal() +
  scale_color_gradient(low = "darkred",high = "red")

以下通过 quanteda Wordcloud 包的实现使用 adjustcolor 解决了我的问题:

library(quanteda)
library(quanteda.textplots)
set.seed(10)
dfmat1 <- dfm(corpus_subset(data_corpus_inaugural,President == "Obama"),remove = stopwords("english"),remove_punct = TRUE) %>%
   dfm_trim(min_termfreq = 3)
col <- sapply(seq(0.1,1,0.1),function(x) adjustcolor("#1F78B4",x))

textplot_wordcloud(dfmat1,adjust = 0.5,random_order = FALSE,color = col,rotation = FALSE)

有什么办法可以将此解决方案转移到ggwordcloud吗?

enter image description here

非常感谢您的建议!

解决方法

我自己找到了解决方案。太明显了...

col <- sapply(seq(0.1,1,0.1),function(x) adjustcolor("#1F78B4",x))

library(ggwordcloud)
data("love_words_small")
data("love_words")

set.seed(42)
ggplot(
  love_words_small,aes(
    label = word,size = speakers,color = speakers
  )
) +
  geom_text_wordcloud_area() +
  scale_size_area(max_size = 24) +
  theme_minimal() +
  scale_color_gradientn(colours = col)

Python + wordcloud + jieba 十分钟学会生成中文词云

Python + wordcloud + jieba 十分钟学会生成中文词云

前述

本文需要的两个python-tutorials.html" target="_blank">Python类库

jieba:中文分词分词工具

wordcloud:Python下的词云生成工具

上节课我们学习了如何制作英文词云,本篇我们将讲解如何制作中文词云,读完该文章后你将学会如何将任意中文文本生成词云

立即学习“Python免费学习笔记(深入)”;

u=3986286550,4041352992&fm=26&gp=0.jpg

代码组成简介

代码部分来源于其他人的博客,但是因为bug或者运行效率的原因,我对代码进行了较大的改变

代码第一部分,设置代码运行需要的大部分参数,你可以方便的直接使用该代码而不需要进行过多的修改

第二部分为jieba的一些设置,当然你也可以利用isCN参数取消中文分词

第三部分,wordcloud的设置,包括图片展示与保存

##Use the code by comment ##
关于该程序的使用,你可以直接读注释在数分钟内学会如何使用它
# - * - coding: utf - 8 -*-
from os import path
from scipy.misc import imread
import matplotlib.pyplot as plt
import jieba
# jieba.load_userdict("txt\userdict.txt")
# 添加用户词库为主词典,原词典变为非主词典
from wordcloud import WordCloud, ImageColorGenerator
# 获取当前文件路径
# __file__ 为当前文件, 在ide中运行此行会报错,可改为
# d = path.dirname(&#39;.&#39;)
d = path.dirname(__file__)
stopwords = {}
isCN = 1 #默认启用中文分词
back_coloring_path = "img/lz1.jpg" # 设置背景图片路径
text_path = &#39;txt/lz.txt&#39; #设置要分析的文本路径
font_path = &#39;D:\Fonts\simkai.ttf&#39; # 为matplotlib设置中文字体路径没
stopwords_path = &#39;stopwords\stopwords1893.txt&#39; # 停用词词表
imgname1 = "WordCloudDefautColors.png" # 保存的图片名字1(只按照背景图片形状)
imgname2 = "WordCloudColorsByImg.png"# 保存的图片名字2(颜色按照背景图片颜色布局生成)
my_words_list = [&#39;路明非&#39;] # 在结巴的词库中添加新词
back_coloring = imread(path.join(d, back_coloring_path))# 设置背景图片
# 设置词云属性
wc = WordCloud(font_path=font_path,  # 设置字体
               background_color="white",  # 背景颜色
               max_words=2000,  # 词云显示的最大词数
               mask=back_coloring,  # 设置背景图片
               max_font_size=100,  # 字体最大值
               random_state=42,
               width=1000, height=860, margin=2,# 设置图片默认的大小,但是如果使用背景图片的话,那么保存的图片大小将会按照其大小保存,margin为词语边缘距离
               )
# 添加自己的词库分词
def add_word(list):
    for items in list:
        jieba.add_word(items)
add_word(my_words_list)
text = open(path.join(d, text_path)).read()
def jiebaclearText(text):
    mywordlist = []
    seg_list = jieba.cut(text, cut_all=False)
    liststr="/ ".join(seg_list)
    f_stop = open(stopwords_path)
    try:
        f_stop_text = f_stop.read( )
        f_stop_text=unicode(f_stop_text,&#39;utf-8&#39;)
    finally:
        f_stop.close( )
    f_stop_seg_list=f_stop_text.split(&#39;\n&#39;)
    for myword in liststr.split(&#39;/&#39;):
        if not(myword.strip() in f_stop_seg_list) and len(myword.strip())>1:
            mywordlist.append(myword)
    return &#39;&#39;.join(mywordlist)
if isCN:
    text = jiebaclearText(text)
# 生成词云, 可以用generate输入全部文本(wordcloud对中文分词支持不好,建议启用中文分词),也可以我们计算好词频后使用generate_from_frequencies函数
wc.generate(text)
# wc.generate_from_frequencies(txt_freq)
# txt_freq例子为[(&#39;词a&#39;, 100),(&#39;词b&#39;, 90),(&#39;词c&#39;, 80)]
# 从背景图片生成颜色值
image_colors = ImageColorGenerator(back_coloring)
plt.figure()
# 以下代码显示图片
plt.imshow(wc)
plt.axis("off")
plt.show()
# 绘制词云
# 保存图片
wc.to_file(path.join(d, imgname1))
image_colors = ImageColorGenerator(back_coloring)
plt.imshow(wc.recolor(color_func=image_colors))
plt.axis("off")
# 绘制背景图片为颜色的图片
plt.figure()
plt.imshow(back_coloring, cmap=plt.cm.gray)
plt.axis("off")
plt.show()
# 保存图片
wc.to_file(path.join(d, imgname2))
登录后复制

20170527170453706.jpg

20170527170355021.png

20170527170622129.png

总结

如果你想用该代码生成英文词云,那么你需要将isCN参数设置为0,并且提供英文的停用词表。

以上就是Python + wordcloud + jieba 十分钟学会生成中文词云的详细内容,更多请关注php中文网其它相关文章!

Python + wordcloud 十分钟学会生成英文词云

Python + wordcloud 十分钟学会生成英文词云

基于python-tutorials.html" target="_blank">python生成的wordcloud

词云在这两年一直都热门话题,如果你耐下性子花个10分钟看看这篇文章,或许你就再也不用羡慕那些会词云的人了。这不是一项高深莫测的技术,你也可以学会。快来试试吧!

u=3986286550,4041352992&fm=26&gp=0.jpg

本篇我们讲解的是如何制作英文词云,下一期我们将给大家带来如何制作中文词云,敬请期待!

快速生成词云

立即学习“Python免费学习笔记(深入)”;

from wordcloud import WordCloud
f = open(u&#39;txt/AliceEN.txt&#39;,&#39;r&#39;).read()
wordcloud = WordCloud(background_color="white",width=1000, height=860, margin=2).generate(f)
# width,height,margin可以设置图片属性
# generate 可以对全部文本进行自动分词,但是他对中文支持不好,对中文的分词处理请看我的下一篇文章
#wordcloud = WordCloud(font_path = r&#39;D:\Fonts\simkai.ttf&#39;).generate(f)
# 你可以通过font_path参数来设置字体集
#background_color参数为设置背景颜色,默认颜色为黑色
import matplotlib.pyplot as plt
plt.imshow(wordcloud)
plt.axis("off")
plt.show()
wordcloud.to_file(&#39;test.png&#39;)
登录后复制

# 保存图片,但是在第三模块的例子中 图片大小将会按照 mask 保存

a.png

自定义字体颜色

这段代码主要来自wordcloud的github,你可以在github下载该例子

#!/usr/bin/env python
"""
Colored by Group Example
========================
Generating a word cloud that assigns colors to words based on
a predefined mapping from colors to words
"""
from wordcloud import (WordCloud, get_single_color_func)
import matplotlib.pyplot as plt
class SimpleGroupedColorFunc(object):
    """Create a color function object which assigns EXACT colors
       to certain words based on the color to words mapping
       Parameters
       ----------
       color_to_words : dict(str -> list(str))
         A dictionary that maps a color to the list of words.
       default_color : str
         Color that will be assigned to a word that&#39;s not a member
         of any value from color_to_words.
    """
    def __init__(self, color_to_words, default_color):
        self.word_to_color = {word: color
                              for (color, words) in color_to_words.items()
                              for word in words}
        self.default_color = default_color
    def __call__(self, word, **kwargs):
        return self.word_to_color.get(word, self.default_color)
class GroupedColorFunc(object):
    """Create a color function object which assigns DIFFERENT SHADES of
       specified colors to certain words based on the color to words mapping.
       Uses wordcloud.get_single_color_func
       Parameters
       ----------
       color_to_words : dict(str -> list(str))
         A dictionary that maps a color to the list of words.
       default_color : str
         Color that will be assigned to a word that&#39;s not a member
         of any value from color_to_words.
    """
    def __init__(self, color_to_words, default_color):
        self.color_func_to_words = [
            (get_single_color_func(color), set(words))
            for (color, words) in color_to_words.items()]
        self.default_color_func = get_single_color_func(default_color)
    def get_color_func(self, word):
        """Returns a single_color_func associated with the word"""
        try:
            color_func = next(
                color_func for (color_func, words) in self.color_func_to_words
                if word in words)
        except StopIteration:
            color_func = self.default_color_func
        return color_func
    def __call__(self, word, **kwargs):
        return self.get_color_func(word)(word, **kwargs)
text = """The Zen of Python, by Tim Peters
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren&#39;t special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you&#39;re Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it&#39;s a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let&#39;s do more of those!"""
# Since the text is small collocations are turned off and text is lower-cased
wc = WordCloud(collocations=False).generate(text.lower())
# 自定义所有单词的颜色
color_to_words = {
    # words below will be colored with a green single color function
    &#39;#00ff00&#39;: [&#39;beautiful&#39;, &#39;explicit&#39;, &#39;simple&#39;, &#39;sparse&#39;,
                &#39;readability&#39;, &#39;rules&#39;, &#39;practicality&#39;,
                &#39;explicitly&#39;, &#39;one&#39;, &#39;now&#39;, &#39;easy&#39;, &#39;obvious&#39;, &#39;better&#39;],
    # will be colored with a red single color function
    &#39;red&#39;: [&#39;ugly&#39;, &#39;implicit&#39;, &#39;complex&#39;, &#39;complicated&#39;, &#39;nested&#39;,
            &#39;dense&#39;, &#39;special&#39;, &#39;errors&#39;, &#39;silently&#39;, &#39;ambiguity&#39;,
            &#39;guess&#39;, &#39;hard&#39;]
}
# Words that are not in any of the color_to_words values
# will be colored with a grey single color function
default_color = &#39;grey&#39;
# Create a color function with single tone
# grouped_color_func = SimpleGroupedColorFunc(color_to_words, default_color)
# Create a color function with multiple tones
grouped_color_func = GroupedColorFunc(color_to_words, default_color)
# Apply our color function
# 如果你也可以将color_func的参数设置为图片,详细的说明请看 下一部分
wc.recolor(color_func=grouped_color_func)
# Plot
plt.figure()
plt.imshow(wc, interpolation="bilinear")
plt.axis("off")
plt.show()
登录后复制

b.png


利用背景图片生成词云,设置停用词词集

该段代码主要来自于wordcloud的github,你同样可以在github下载该例子以及原图片与效果图

#!/usr/bin/env python
"""
Image-colored wordcloud
=======================
You can color a word-cloud by using an image-based coloring strategy
implemented in ImageColorGenerator. It uses the average color of the region
occupied by the word in a source image. You can combine this with masking -
pure-white will be interpreted as &#39;don&#39;t occupy&#39; by the WordCloud object when
passed as mask.
If you want white as a legal color, you can just pass a different image to
"mask", but make sure the image shapes line up.
"""
from os import path
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
d = path.dirname(__file__)
# Read the whole text.
text = open(path.join(d, &#39;alice.txt&#39;)).read()
# read the mask / color image taken from
# http://jirkavinse.deviantart.com/art/quot-Real-Life-quot-Alice-282261010
alice_coloring = np.array(Image.open(path.join(d, "alice_color.png")))
# 设置停用词
stopwords = set(STOPWORDS)
stopwords.add("said")
# 你可以通过 mask 参数 来设置词云形状
wc = WordCloud(background_color="white", max_words=2000, mask=alice_coloring,
               stopwords=stopwords, max_font_size=40, random_state=42)
# generate word cloud
wc.generate(text)
# create coloring from image
image_colors = ImageColorGenerator(alice_coloring)
# show
# 在只设置mask的情况下,你将会得到一个拥有图片形状的词云
plt.imshow(wc, interpolation="bilinear")
plt.axis("off")
plt.figure()
# recolor wordcloud and show
# we could also give color_func=image_colors directly in the constructor
# 我们还可以直接在构造函数中直接给颜色
# 通过这种方式词云将会按照给定的图片颜色布局生成字体颜色策略
plt.imshow(wc.recolor(color_func=image_colors), interpolation="bilinear")
plt.axis("off")
plt.figure()
plt.imshow(alice_coloring, cmap=plt.cm.gray, interpolation="bilinear")
plt.axis("off")
plt.show()
登录后复制

展示效果如下: 

c.png

d.png

e.png

以上就是Python + wordcloud 十分钟学会生成英文词云的详细内容,更多请关注php中文网其它相关文章!

今天关于Python:wordcloud,重复单词python重复单词检查的分享就到这里,希望大家有所收获,若想了解更多关于cannot import name ''WordCloud'' from ''wordcloud''、GGWordcloud 带渐变色/透明字(带adjustcolor 的GGPlot Wordcloud 渐变)、Python + wordcloud + jieba 十分钟学会生成中文词云、Python + wordcloud 十分钟学会生成英文词云等相关知识,可以在本站进行查询。

本文标签: