Using Neural Networks to Find Commies

Name
Email
Subject
Comment *
File	Select/drop/paste files here
Password	(Randomized for file and post deletion; you may also set your own.)
Archive	Archive [500 char limit]
* = required field	[▶ Show post options & limits] Confused? See the FAQ.

Embed	(replaces files and can be used instead)
Voice recorder	Show voice recorder (the Stop button will be clickable 5 seconds after you press Record)
Options	Do not bump (you can also write sage in the email field)
Allowed file types:jpg, jpeg, gif, png, webm, mp4, pdf Max filesize is 16 MB. Max image dimensions are 15000 x 15000. You may upload 5 per post.

Using Neural Networks to Find Commies Anonymous 12/21/20 (Mon) 05:07:56 a51187 No.12112888

I posted a thread about a week ago announcing starting this project, iirc it was pinned as "notable" but cant find it now, oh well.

I finished the finishing touches on a functional beta version of a program that uses a neural network fuzzy name comparisons to find potential overlap between datasets. While 1 for 1 comparisons are easy to identify, I can attest from extensive experience that real world data is rarely perfect, this method should account for more accurate comparisons when considering different formatting and/or typos (e.g. Donald Trump vs D Trump vs Trump, Donald vs Dnoald Tump etc).

My hope is that we can use this to efficiently and effectively identify the information of potential Chinese spies (from the recent lists) among the rosters of known organizations for further investigation (obviously same name doesn't guarantee same person). It should be fairly user friendly to run, you don't need to be a coder. You will need the 64 bit version of Anaconda installed as well as the 'hmni' python library. I included the Jupyter notebook (open it within anaconda's Jupyter Notebooks and follow instructions) as well as 40k entries from the leak ('base.csv'", this is all I have, curious if there's more easily retrievable) and a dummy "proof of concept" test database ("Test.csv") for you to play with.

Next step from there (beyond of course testing) is to track down datasets that should be compared against as well as further refine the program. I wrote this agnostic of inputs, so it could just as easily be used for any leaks in the future. It's an O(NxM) algorithm and in current implementation I'm averaging about 3.1 minutes to complete the test run (400k comparisons) using all threads of my 4770k, curious how it would perform on your guys ends. As far as I'm concerned, completely open source. If you see something to improve by all means do so. Might make sense to setup some way to keep track though.

Anyway hope it can help us. I'll hang around for a bit to answer any questions, provide any insight I can, etc etc.

https://filebin.net/e6m82xmc5n44f1n5/Name_Indentify.zip?t=nh9sz1l3

____________________________

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

Anonymous 12/21/20 (Mon) 05:24:52 000000 No.12113052

>>12112888

>I posted a thread about a week ago announcing starting this project, iirc it was pinned as "notable" but cant find it now, oh well.

The Chinese anons made a response to you from the China thread earlier:

>>12019047

Yours:

>>12017045

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.