QQ聊天记录分析工具 Analyzing Tool for QQ Chat History

写在前面 Notice

1. 这是一个完全免费的工具。
This is a completely free tool.

2. 我不是专业码农,因此不能保证功能不出现问题。如果你在使用中发现任何bug,请和我联系,除了用户电脑本身的问题以外,绝大多数的问题都可以得到及时解决。
It is hard to get rid of all bugs, and I cannot make guarantee for the functionality of the tool. If you find any bug when using it, please contact with me. I promise that most of the problems (expect those caused only by your computer) could be solved in time.

3. 本工具基于Python 3.6开发,是国庆假期在家练习Tkinter时写的。与之前基于Matlab写的工具相比,基于Python的好处是不用再安装运行环境了,直接下载、解压后找到exe文件就可以用,坏处是打包后的文件有点大。
This tool is developed based on Python 3.6. Compared with Matlab-based tools, python-based tools do not need additional installation of Runtime and thus could be directly executed by double-clicking .exe file, however, they seems to be much larger in file size.

下载 Download

Option 1. Download the executable (packed in zip file) and directly use it without needs to install python.

QQ聊天记录分析工具-exe版 (QQ Chat History Analyzer, executable)

方式二:如果有python基础,并安装了python3,那么可以下载下面的zip文件,里面包含了一个py文件和一个停用词字典,把二者放在同一目录下,运行py文件即可。需要注意的是,运行本文件需要一系列的python依赖包,包括:re, pandas, numpy, tkinter, sklearn, matplotlib, networkx, jieba, wordcloud, zhon.
Option 2. Those who have already installed python3 could use following zip file, which contains a py file and a stop words dictionary. Please note that this py file would have to import several modules, including re, pandas, numpy, tkinter, sklearn, matplotlib, networkx, jieba, wordcloud, zhon.

QQ聊天记录分析工具-python源码 (QQ Chat History Analyzer, python code)


视频示例 Demo Video 


功能介绍 Introduction of functionality

QQ is the most popular live chat tool in China. QQ chat history records so many information, from which some interesting findings could be derived. This tool allows users to analyze QQ group chat or private chat history, including time series analysis, basic features of group members (chatting frequency, active period, frequency of starting/ending a topic,…), textual analysis (word cloud, chatting summary), association analysis (‘I’ vs. others, social network graph), etc.

Users could export a chat history record file in txt format from QQ and then import it to this tool. Two necessary steps are pretreatment of the txt file (using regular expression to extracting messages one by one) and specification of nicknames. After that, users could run kinds of analysis according to the instructions that whether a button is grey (unavailable now and need something run before) or black (available now).

Followings are some examples of results. Considering privacy, all nicknames are set to A/B/C/D….

Let’s start with statistics concerning time, including statistics by year/month/day as well as by 24 hours in a day.

Now let’s move to basic chatting features of each member, such as chatting frequency, as well as frequencies of starting/ending a topic.

Followings are about associations among members. For example, who replies to me most, and to whom I reply most?

The direction index (DI) in the following figure is calculated as (x-y)/(x+y), where x represents the amounts of others replying to me and y represents the amounts of I replying to others. DI=0 means the two person are chatting with each other in a balanced way; DI=1 means someone is keeping reply to me while I never reply to him/her; vice verse for DI=-1.


The network graph below shows the total amount of replies between each pair of members by the thickness of the line.

In addition to total amount, the network graph could also illustrate the directions. For A and B, if the amount of A replying to B is more than half of the total amount (A replying to B plus B replying to A), this kind of unbalance would be represented by an arrow A→B. In the following graph, a more strict threshold (53%) is used.

Finally, let’s move to textual analysis. This tool could generate word cloud by 3 algorithms. The default one is TF-IDF, i.e., term frequency – inverse document frequency. The second one only considers term frequency. The third one is TextRank, which is based on the text network. Parsing is the foundation of word cloud. For parsing, this tool allows specifications of numbers, letters, stop words dictionary, user-defined dictionary, and so on.

The following picture illustrates the word cloud for the records of all members.

更复杂的文本分析是以句子为单位,提取关键句,形成摘要。这一部分功能还只是尝试。主要算法是计算每两句话之间的相似度,然后利用相似性传播聚类(Affinity Propagation Cluster)算法提取能够代表其他句子的原型句,或者是通过TextRank算法提取在句子相似性网络中中心度较高的句子。
More complex textual analysis is to extract key sentences to generate a chatting summary. This function is only a first attempt. The algorithms are as follows: firstly, measure the similarities between each pair of two sentences, then apply affinity propagation cluster algorithm to extract exemplars which could represent other sentences, or use TextRank algorithm to extract key sentences with high degree in the sentence network.

Here is my personal chatting summary.

This Post Has One Comment

  1. 在“设定分析中使用的昵称”时软件会自动关闭,请问如何解决?分析的群有两千人,记录约一万三千条。如能邮件答复更好,谢谢!


Close Menu