|
@0xDUDE | |||||
|
Based on the 3,784,309,399 WeChat messages we tried to build a "keyword trigger list" with NLP tools which possibly triggered the automatic selection criteria for having the entire conversation being stored for review.
Image: i.imgur.com/PWNQEpe.png
Text: pastebin.com/raw/LCPyenzC pic.twitter.com/0H8r1N1jBH
|
||||||
|
||||||
|
Victor Gevers
@0xDUDE
|
2. ožu |
|
Can anyone (from China) identify these Messaging services?
imsg <--...
qg <--...
qqmesg. <-- imqq.com
wwmsg <--...
wxmsg <--...
yymsg <--...
In China, they have a surveillance program on social networks which looks like a jerry-rigged PRISM clone of the NSA.
|
||
|
|
||
|
Victor Gevers
@0xDUDE
|
18. ožu |
|
Daily roughly 1 billion private messages get selected & routed to the closest "operator" based on geolocation. It's fascinating how quickly new monitoring solutions are deployed in the same way as the old ones were discovered & taken down. Country-based filtering for "protection" pic.twitter.com/Cm5HtR5PSk
|
||
|
|
||
|
Victor Gevers
@0xDUDE
|
22. tra |
|
What we have learned from 1.081.231.257 "captured" WeChat dialogues ( 3,784,309,399 messages) made on the 18 March 2019 is that were automatically selected for "reviewing" based on a "keyword" trigger.
Not all the dialogues were in Chinese or only had GPS coordinates in China. pic.twitter.com/4eyYgCXD3C
|
||
|
|
||
|
Victor Gevers
@0xDUDE
|
22. tra |
|
From 3.784.309.399 messages, 3.698.798.784 were written in Chinese.
59.378.236 in English and 26.132.379 in another language. 98% of the Chinese messages had a GPS location in China. 68% of the English messages were sent in China. More than 19 million were sent from outside 🇨🇳 pic.twitter.com/Va8Lfk3dnw
|
||
|
|
||
|
Victor Gevers
@0xDUDE
|
22. tra |
|
We were able to detect a patron of a little bit more than 800 Chinese keywords (combinations) which would be the selection criteria for having the entire WeChat dialogue being stored in this database for further "analysis" by most likely a law enforcement. pic.twitter.com/VCvHp2pSyD
|
||
|
|
||
|
Victor Gevers
@0xDUDE
|
22. tra |
|
We could build a "dictionary" of 829 keywords (combinations) based on the intercepted WeChat messages which were written in English. I was a bit surprised to see my full name "Victor Gevers" in this generated English list. 维克多 葛弗斯 was not in the Chinese keyword list. pic.twitter.com/j9NDhqNpmk
|
||
|
|
||
|
Victor Gevers
@0xDUDE
|
22. tra |
|
Using these keywords will not get your account locked. But I you try to send your contact a few messages contains a few hundred of these words then you need to “unblock” your account after a few minutes. pic.twitter.com/KwvopraiQQ
|
||
|
|
||
|
Victor Gevers
@0xDUDE
|
25. tra |
|
From 3.784.309.399 intercepted messages. 59.378.236 were in English.
19 million were sent from outside Mainland China: South Korea, Taiwan, US, Australia, Canada, Colombia, Venezuela, Belgium, France, UK, Germany, Netherlands, Turkey, Italy, Switzerland, New Zealand & Ireland. pic.twitter.com/7UDRLLIICs
|
||
|
|
||
|
Victor Gevers
@0xDUDE
|
25. tra |
|
I am listening to the @riskybusiness show [twitter.com/riskybusiness/…], and I hear this at 21:50:
"We've got politicians in Australia who are using WeChat."
Wait!? What? So they can have been one of the 937202 "flagged" conversations recorded in Australia? 🤷♂️ pic.twitter.com/hGwH1ScM4K
|
||
|
|
||
|
Victor Gevers
@0xDUDE
|
29. tra |
|
512.2 million WeChat accounts (unique wxids) sent 3,784,309,399 messages on 18-03-2019. 1 billion captured WeChat conversations contained keywords which marked for "review". 59.378.236 were written in English.
19 million were sent from 🇰🇷🇹🇼🇺🇸🇦🇺🇨🇦🇨🇴🇻🇪🇧🇪🇫🇷🇬🇧🇩🇪🇳🇱🇹🇷🇮🇹🇨🇭🇳🇿🇮🇪 pic.twitter.com/uWuMEmspSA
|
||
|
|
||
|
Victor Gevers
@0xDUDE
|
10. lip |
|
BBC China social media: WeChat and the Surveillance State
bbc.com/news/blogs-chi…
|
||
|
|
||
|
Victor Gevers
@0xDUDE
|
7. srp |
|
In the "phrase matching" process the Chinese data scientist student used these Chinese keywords from this wordlist github.com/citizenlab/cha…
So we can safely assume that the keyword trigger list is far from complete. So we decided to do this research all over again from scratch... pic.twitter.com/472Bf22hzF
|
||
|
|
||
|
Victor Gevers
@0xDUDE
|
3. sij |
|
A quick status update. The data scientist who created the current keyword list is still MIA. [twitter.com/GDI_FDN/status…]. We did not make so much progress. Yet new breadcrumbs are slowing surfacing thanks to termination of third party translation services. github.com/cookiemonster/… pic.twitter.com/wE8N5k56Ec
|
||
|
|
||
|
葱头
@wangyongcong
|
25. tra |
|
Great job. As a MMO game (which has an embedded chat system) developer a few years ago, I got access to a keyword list which is fucking amazing. And later when I joined another project, I found the list had grown more greater, including regular expressions.
|
||
|
|
||
|
Victor Gevers
@0xDUDE
|
25. tra |
|
To get a better understanding of which regular expressions are used would be very interesting. For now, we assume they are also doing a simple scoring based on the number of keywords, but the weight per keyword is still an estimation based on the number of repetitions.
|
||
|
|
||
|
momo
@hiiammomo
|
25. tra |
|
我不相信98%使用微信的人都打开了GPS,是通过什么方式获得GPS信息的?另外2%是使用了无法识别的机型吗?😅
|
||
|
|
||
|
Victor Gevers
@0xDUDE
|
25. tra |
|
It is about 1 billion conversations made on 18 March 2018. These are not all the WeChat users. These are only the conversation which were flagged based on keywords. All these records had GPS locations tagged with it
|
||
|
|
||