[ / / / / / / / / / / / / / ] [ dir / agatha / ameta / arepa / asmr / baphomet / bflo / ck / general ]

/qresearch/ - Q Research Board

Research and discussion about Q's crumbs
Comment *
Password (Randomized for file and post deletion; you may also set your own.)
* = required field[▶ Show post options & limits]
Confused? See the FAQ.
(replaces files and can be used instead)

Allowed file types:jpg, jpeg, gif, png, webm, mp4, pdf
Max filesize is 16 MB.
Max image dimensions are 15000 x 15000.
You may upload 5 per post.


First time on QResearch? 8chan? Click here, newfag.

File: 16d6aaaded43169⋯.jpg (26.69 KB, 657x527, 657:527, Frog Detective.jpg)

85a843 No.494745

A place for codefags to make the chans searchable.

8e345d No.494816


ctrl-f as in fagg0t like 0p

85a843 No.495005

File: 5bf2c7f24f70616⋯.png (107.74 KB, 1826x973, 1826:973, Snip 1.PNG)

File: 83b29fb7a9e7d68⋯.png (86.44 KB, 1885x1007, 1885:1007, Snip 2.PNG)

File: 446d61fffe32244⋯.png (63.29 KB, 1119x994, 1119:994, Snip 3.PNG)

File: 37d58ab626e65b6⋯.png (55.22 KB, 1826x687, 1826:687, Snip 4.PNG)

Posts from #608






























1eba68 No.495890

One further comment from a heavy database user for what it's worth:

If we had a list of 'tags' that anons could enter as they post (in a specific format, e.g. preceded by **) covering topics that emerge (such as 'mkultra', 'bridge' etc. - related to topics brought up by Q) when searching through the data it would serve as a way to link crumbs by subject and as an additional variable / filter in any search would serve to streamline any search.

these would have to be moderated by BV / BO / Baker; would not be any more work than creating the notable posts per bread, although would be useful to find a way to insert them after the post identifier to create the link (i.e. >>xxxxxx within/2 **xxxxxx = TRUE).

Historical would be an issue but if there were some way of batch-adding at data assimilation stage based on linked crumbs, as well as specific 'meta-moderators' as we run searches etc.

However it might work - principle is an easily assignable value to identify crumb subject based on Q's topics so more information can be retrieved via regular search.

85a843 No.496386

File: da133a2467e3170⋯.png (6.88 KB, 768x100, 192:25, Snip 5.PNG)

85a843 No.496431


>One further comment

Well, you kind of lost me pretty quickly. Correct me if I'm wrong, but what you're suggesting is for posts going forward, and posts that are Q centric.

My goal is to see ALL of the board searchable because much of the digging and research that was collected was not just related to items Q had in mind, but many ancillary topics and evidence discovered would help build the "parallel construct".

That's what I see as important, your thoughts?

91c771 No.496858

Might I suggest using SQLite as the DB for the"file format". It's a single file db that performs well for read heavy workloads, is single file, so easy to distribute, easily usable from PHP and just about any other programming language, and could easily be used to load a regular server based db (obv depending on how the schema is designed). Also multi-platform, so should keep everybody happy irrespective of what OS you use.

91c771 No.496999


I forgot to mention, SQLite also supports full text indices via the ft4 virtual table type

6a8d0c No.497972

I made a thread a few minutes ago asking if a wiki could be a good format to organize findings? Could help with navigating. What do you think?


dbb4a4 No.498755


I've been working on exactly this. I'm pulling the catalog from ga & qresearch. Finding the research general threads and saving those with q posts. Only goes back to about 2/15 when I turned the machine on. Currently working on getting old posts reconstructed. 99% sure I can grab all breads from 8ch.

C# dll to scrape q posts and threads from 8ch. 8ch+ json format but could be serialized XML I guess

4073db No.499327

One of the anons from the other thread.

I'm not going to jump in too much if other's are doing something where we end up stepping on Pepe's toes. Couple of thoughts though…

- Full text searches/indexes can be garbage. Only good for reserved words

- Most likely want this in a relational database. Creating the schema would consist of a really simple data model. Not even sure I would worry about normalizing it.

- Messages (body) could be stored in a blob and be searched with wildcards.

- Only looking at about 10-15 different queries tops. All simple SQL statements except for a couple that would need to be hierarchical..but still easy.

- I was thinking to use MySQL or SQL Server for the DB Engine.

- Biggest challenge will be the parsing of the threads and crumbs into a loaded format for the database. Once in a useful format…loading will be easy.

I see three main parts to this:

1) Getting the data so it can be loaded into a database.

2) Creating the database structures (really should be first)

3) Spitting out the queries, views, and sprocs that will be used. And putting a front end on it.

* almost doxxed myself and put a link to my web site…so close :-)

4073db No.499341


>C# dll to scrape q posts and threads from 8ch. 8ch+ json format but could be serialized XML I guess

Good call me thinks

4073db No.499393


I think that is a great thought. May be a good idea to just get one set started and loaded then look into the other boards.

We (at least I) can't see a way to search the 'board' itself, but to create a copy of the data in the threads and make those searchable.

7cdf2a No.501143


>Download Chan.

>Host JSON of posts.

>Build simple interface.

>Use nginx as reverse proxy.



Why the fuck do you want a DB when it's already JSON. FFS.

8dbdfa No.501166


Open Source, Cross Platform search engine library - xapian.org

8dbdfa No.501352


github .com/mcmontero/php-xapian JSON support and web-friendly middleware

7cdf2a No.501408

File: 3e025a52fbb6b5e⋯.png (27.01 KB, 634x278, 317:139, Screenshot from 2018-02-26….png)

A better way to do this is to probably put everything client side. Make a cross platform application that just fetches new posts every so often. The browser is pretty perfect for this is we can set up a cross platform local server to host a local copy of qcodefag and this board.


https:// github.com/bvaughn/js-search

Pros: Fast enough once index is built.

Cons: Have to build index, or send it from a server, ipfs, blockchain, whatever.


rip it from qcodefag for q posts

Add 8ch layout to some button on qcodefag or some tab

Display the posts as normal, but add search bar for board side of new client for qcodefag and this board.

Pic related, it's easy to get .json formatted threads.

inb4 we all pwn ourselves.

7cdf2a No.501440


Conveniently this also alleviates the clown issue should that garbage bill pass. I mean not really since we'll still own ourselves but fuck, we can try.

8143cc No.502752




>>Research Threads Ideas. Please claim or create yours, let us know of more subject ideas


>>Quest for Research Searchability Thread


>>>494745 (You) (You)


>Thanks for including my thread. I'm not a coder so I'm not much more than a cheerleader. I am quite sincere in my belief that we have to make it all searchable. I'm not naive enough to expect a volunteer to tackle it. Without doxxing themselves, can any anon point me to a service or company that could accomplish this Quest?

8143cc No.502773


>>494745 (You) (You)

Thanks for including my thread. I'm not a coder so I'm not much more than a cheerleader. I am quite sincere in my belief that we have to make it all searchable. I'm not naive enough to expect a volunteer to tackle it. Without doxxing themselves, can any anon point me to a service or company that could accomplish this Quest?

885f7e No.502802


A pleasure anon. Here's wishing you all, all the very best in this noble quest. It would be Christmas for us all if you did it. GODSPEED.

8dbdfa No.502951


The omega interface for xapian could do most work. wget to grab site data, json->csv converter to translate, and to be ready to go. All Free and open source software. Not quite plug and play, but a start.

Sample usage described:

xapian.org /docs/omega/overview.html

linode .com has very affordable linux shell hosting.

e4e2ff No.503420

Hey just had a thought but couldn't /ourguys/ look at all bullets ( like they have to on a crime scene )?

Wouldn't "LIPPEL" the one who had been "grazed" be able to connect bullet to her dna with whatever DNA would be on her?

What about the other student who was walking after being shot in both legs by 4 rounds?

Where is the DNA for that match to bullet?

What about the dead coach, the HERO we seen at the funeral? DNA match to that?

All this stuff might not help us ATM but IMO,

would play a big handle in the game out there with Q and friends?

https:// www.youtube.com/watch?v=cPvYxTa1ph4

https:// www.youtube.com/watch?v=cPvYxTa1ph4

https:// www.youtube.com/watch?v=cPvYxTa1ph4


Another thing, this video she talks about how "BREAKING THE GLASS WITH SHOTS" starting at @ 2:05.and then she says they arrived..


She then states at the end of the video then she states the "Swat team/Police" was on the ground, she aid they were banging on the doors to let them in, she "DIDN'T TRUST IT WAS THEM, BECAUSE THE POLICE WERE BANGING ON THE DOORS - NOBODY GOT UP"


Whole story right here in the video proves it was either a False Flag or some type of fuckery

1ca2ba No.503685


It's not really a company, but wouldn't the person running the 4plebs archive be a good place to look for tools/code in this quest? Maybe he'd even be willing to assist? The site uses some fairly powerful search tools for certain halfchan boards already. I'm not a codefag so I apologize if this hasn't been suggested already.

https:// archive.4plebs.org/_/articles/faq/

8143cc No.503938


That's a good suggestion. Do you know off the top of your head how many archive sites have been used at 4ch and 8ch? I know about archive.is and 4plebs, but I've seen a lot more. I'm pretty sure the threads are scattered about the internet.

912746 No.504261


What about a bulletin board type of system like vbulletin for example? built in search and different forums and sub forums for topics.

ae5226 No.504748

File: c355f8e55c85a3b⋯.png (33.62 KB, 189x244, 189:244, Screen Shot 2018-02-26 at ….png)

ae5226 No.505007

File: 6a8f1ccdd28a3a6⋯.png (24.86 KB, 181x245, 181:245, Screen Shot 2018-02-26 at ….png)


8143cc No.505181



Thanks, I'm sure there's a lot of good stuff here, I monitor them daily. Are you implying posts have been excised from threads and posted in these subs?

8143cc No.505196


> excised from threads and posted in these subs?

Sorry, didn't finish my thought, and they might not be captured in a search of posts in Qresearch? Not sure why you posted these.

ae5226 No.505256

File: 6a1582dc8728f23⋯.png (41.71 KB, 176x257, 176:257, Screen Shot 2018-02-26 at ….png)

File: 29c38887f12c348⋯.png (33.06 KB, 190x248, 95:124, Screen Shot 2018-02-26 at ….png)



ec7b2a No.506133

File: facc20f480a8350⋯.gif (11.98 KB, 333x110, 333:110, sociopol_falseflag29.gif)

In naval warfare, a "false flag" refers to an attack where a vessel flies a flag other than their true battle flag before engaging their enemy.

It is a trick, designed to deceive the enemy about the true nature and origin of an attack.

In the democratic era, where governments require at least a plausible pretext before sending their nation to war, it has been adapted as a psychological warfare tactic to deceive a government's own population into believing that an enemy nation has attacked them.

In the 1780s, Swedish King Gustav III was looking for a way to unite an increasingly divided nation and raise his own falling political fortunes.

Deciding that a war with Russia would be a sufficient distraction but lacking the political authority to send the nation to war unilaterally, he arranged for the head tailor of the Swedish Opera House to sew some Russian military uniforms.

Swedish troops were then dressed in the uniforms and sent to attack Sweden's own Finnish border post along the Russian border. The citizens in Stockholm, believing it to be a genuine Russian attack, were suitably outraged, and the Swedish-Russian War of 1788-1790 began.

In 1931 the Japan was looking for a pretext to invade Manchuria. On September 18th of that year, a Lieutenant in the Imperial Japanese Army detonated a small amount of TNT along a Japanese-owned railway in the Manchurian city of Mukden.

The act was blamed on Chinese dissidents and used to justify the occupation of Manchuria just six months later. When the deception was later exposed, Japan was diplomatically shunned and forced to withdraw from the League of Nations.

In 1939 Heinrich Himmler masterminded a plan to convince the public that Germany was the victim of Polish aggression in order to justify the invasion of Poland.

It culminated in an attack on Sender Gleiwitz, a German radio station near the Polish border, by Polish prisoners who were dressed up in Polish military uniforms, shot dead, and left at the station.

The Germans then broadcast an anti-German message in Polish from the station, pretended that it had come from a Polish military unit that had attacked Sender Gleiwitz, and presented the dead bodies as evidence of the attack. Hitler invaded Poland immediately thereafter, starting World War II.

http:// www.bibliotecapleyades.net/sociopolitica/sociopol_falseflag29.htm

For hundreds of links to FF research/reports, use this link below. You are welcome Anons..

http:// www.bibliotecapleyades.net/sociopolitica/sociopol_falseflag.htm

8143cc No.509646


>person running the 4plebs archive be a good place to look for tools/code in this quest?

For the archives 4plebs uses sphinx search (http:// sphinxsearch.com/). It's used to index from the database and display search results very quickly.

Easy to implement but I would say it's worth it only if you have a lot of data to search through. For smaller datasets you can use full text search included in a regular database engine.

Also you can take a look at other search engines like Solr (http:// lucene.apache.org/solr/) and elasticsearch (https:// www.elastic.co/)

af8c7d No.510581


been using duckduck for searches

af8c7d No.510592

cryptocert keys moded on puter… should i reboot or undo?

9176e6 No.519706

YouTube embed. Click thumbnail to play.


I also would Second the Idea of using Sphinx - it can be connected to a currently live database and given clues and sample queries to Index all text in the DB - https://

www.percona.com/resources/technical-presentations/how-optimally-configure-sphinx-search-mysql-percona-live-mysql and they have a video. I don't think there are any existent Docker setups to play with, although I imagine 8ch is quite custom anyway.

dbb4a4 No.520068



OK So I think I've got my chanscraper console app working as designed.

AFAIK, I've got all the QPosts in a single JSON, I've got complete breads starting with Bread #364 2018-02-07. That's as far back as I've been able to reach programatically. Each complete bread has also been filtered into another json file containing just Q's posts.

The complete breads have only come from 8ch. The chanscraper is set up to whee it could scrape 4ch as well - assuming the json is still available.

I'm showing 825 QPosts - 1 more than qCodeFag because I believe I have a deleted

one. All counted it's 210 threads.

I've done all the hard work of setting up the old catalog/threads/posts. Its set up where you can specify how far back to refresh (to cut down on unnecessary http gets), It reads in the existing data, finds the new threads to search for on 8ch/greatawakening and 8ch/qresearch, and then archives the threads/posts that q has made locally.

If anybody wants the full Q archive as I have it now, here it is: 6mb https:// anonfile.com/H6B7G7dcbc/QJsonArchive.zip

I'm going to integrate the DJTweets + minute Deltas in this week.

Once I get this all cleaned up I'll cut it loose on Github if there are any C#codeFags interested.

My idea is to set up a simple HTML page using some javascript that can be run locally on a single users machine or website. Since the scraper is a C# dll it could be set up to run as a timed service on a web server to keep a site up to date.

98bd4e No.520151


Code at github.com/anonsw/qtmerge does some similar things. Check it out, maybe there are some useful ideas to lift from there: anonsw.github.io

dbb4a4 No.520179


Yeah I knew about that - but I'd already been getting data from QCodeFag. The QCodeFag data was the basis for what I have now since it had already done the scraping on 4ch. I wanted my own in C# source going forward that I can use locally with my other C# code.

7cdf2a No.520183

I don't know why nobody cares but it's trivial do download threads, posts, and boards through the 8ch api in the form of JSON. There is no reason to not have the local client make the get request every so often.

dbb4a4 No.520193


Yep. That's why I did it. Getting all the JSON is easy once you know where everything is - but stuff sliding off the catalog was what made me want to keep a local archive.

7cdf2a No.520201


I meant the hypothetical client with which people are searching this board and staying updated. That client should search for posts all on it's own instead of relying on a single source of truth. (saves infrastructure money too)

dbb4a4 No.520237



Once I get it finished I'll provide a single HTML page that is like QCodeFag. View on your desktop.

Run the chanscraper then view the HTML to see new posts

98bd4e No.520263


Cool, check out qanonmap too for posts no longer retrievable. I think they have some that qcodefag doesn't have.

dbb4a4 No.520278



Whazza qanonmap url?

https:// qanonposts.com/ ok?

98bd4e No.520298




not sure if thestoryofq.com is related

But they are qcodefag forks.

dbb4a4 No.520323



Duh. I had it.

I noticed that qanonmap.github.io has 827 posts and qanonposts.com has 824.

That's going to cause my OCD great consternation.

98bd4e No.520373


Yep, but I think new ones just haven't been added yet to qcodefag.

dbb4a4 No.520391


Hmm.. That doesn't help me - I've got those. I'm only showing 825

07564d No.524371


Ctrl-f is only good on a single thread. What researchers really need is a way to access the entire set of Q posts. I've built that capability for myself locally by parsing ctrl-s saves of the threads into a MySQL database and running SQL searches on that.

The best bet for a public search engine might be to cooperate with CodeMonkey to build a search capability for the boards. We'd still have to search each board separately, but at least we would be able to search each board all at once.

I've got most of the Q related posts from 4chan and 8ch locally, but I'm not sure how to make that much data publicly available. I've also got a fair amount of PHP code that I use to access and organize the raw data. I'd be willing to share it if I had a place to do it.

07564d No.524384


Actually, I have had chan posts show up in browser search engine results, but I know this isn't what you're after. I've built the type of search capability you're after on my local machine. It still takes a lot of time to work with the posts, but it's definitely easier than anything we can do at the original sources.

07564d No.524395


Timeline is easily generated when one has the ability to set the post time to something other than the current time. That's how I create timeline posts in my own database.

07564d No.524431


I definitely appreciate that notable posts are included in the breads on each thread. It isn't necessary for them to be updated on each and every thread, but it is good to have them updated at least every day. Right now, I'm using the links in the bread posts to mark posts in my private database as being included in the bread. Given the volume of posts that I am now working with, these links make it easier to determine what is important to include.

07564d No.524456


I use PHP because it's free. *shrug*

07564d No.524489


If you're lucky, you can find your archives on archive.org. That site saves pages with about nearly the same HTML elements as the original page. Archive.is converts the classes used on the original page into their style equivalents, making for a parsing nightmare. When I've had to use the archive.is version of a page, it was a painstaking process to recreate the single post that I went to the archive to get. My parser code can parse the archive.org archives the same as the original, so it's easy to get all posts from that archive.

07564d No.524503


I've already done this. I'm willing to share my data structures and parsers, if I have a place to do it.

07564d No.524511


I've got tagging fields included in my data structure. Getting them filled is an entirely different matter. I've got a tool to help do it more efficiently than phpMyAdmin, but it needs a bit of work to make it just a bit more efficient so that more than one post can be updated in one pass.

07564d No.524530


The challenge is classifying the posts to determine which sub forum to direct them to. Not trivial.

07564d No.524543


There are over 750,000 total posts from both sites and all boards containing Q related posts. It's a large data set now.

838074 No.524965

Why not just build a 4chan archive site? That's the main thing lacking from 8ch.

7cdf2a No.525489

Literally just build an index of tags and use fucking client side javascript. Muh databases. Jesus Christ people. You could even let users share tags.

First one with a completed project wins. Peace.

dbb4a4 No.525531


https:// 8ch.net/qresearch/archive/index.html

dbb4a4 No.527353


Here's the archive again + a handy HTML page that you can use in your browser to view the archives locally. Works fine in Chrome and IE. Readme included.

https:// anonfile.com/W3f5H6d8be/QJSONArchive.zip

838074 No.529626


OK, so why not do the fashionable, continuous integration FOSS thing and add searching to the archive site at the repo?

dbb4a4 No.530101


I expect because 8ch is not a massive corporation with a bunch of resources at their disposal. /sudo/

838074 No.530155


What difference does that make? Anons are gathered here. Why don't they just go there to assist in development instead of fragmenting and branching out to 1000 directions? Consolidate, integrate, then diverge.

ca4dab No.530283

File: 40d6ea4f332a28a⋯.png (377.24 KB, 1822x426, 911:213, ClipboardImage.png)

6db142 No.530474



If it's server based something like http:// arborjs.org/ For data visualization/selection would then fix the mapping problem and help a lot with the search problem.


There's also the Open Visual Thesaurus project to maybe grab code/ideas from www.chuongduong.net/thinkmap/ to view the data search and what else might be related to walk through the data.

dbb4a4 No.530677


Here's a newer local archive that moves there.

I've put in some UI enhancements to the JSON Viewer HTML page. Seems to be working good. With a slight mod it could work with local json from any QCodeFag site or even direct from 8ch.

https:// anonfile.com/5ercH3d9ba/QJSONArchive_v1.zip

Getting the posts into 2 columns should be no problem. It's getting a reliable news source that is gonna cause you trouble.

I was planning on putting 3 columns in the viewer, QPosts, Times, DJTweets. In doing all this I've discovered a few things about 8ch/halfchan. The post id's are not guaranteed unique. The best unique key is time and I've found 2 posts that dropped at the same timestamp. Thematically I've been trying to key everything to time. [qposts, tweets, news]

dbb4a4 No.530720


Jump in.

07564d No.530920


yEd can produce maps from spreadsheet data. That's one I know of.

https:// www.yworks.com/products/yed

Maybe when I get further along in the post tagging work, it'll be useful.

I'm toying with the idea of making my raw data available in some way, possibly in read only format. (Clowns can be destructive.)

07564d No.530978


I would like to be able to allow others to tag posts in my database. Any ideas on how to keep clowns from shitting everything up?

My initial thought is to allow suggesting of tags (similar to comment logic in the blog) with moderators making final decisions on them.

07564d No.530994


One of the big reasons I hesitate in making the entire database available is because a few of the images uploaded into the threads are obscene. I have no desire to inadvertently public that sort of thing. When I'm publishing a reviewed subset, the chances of that happening are low.

00c874 No.532910


Perhaps?? just a guess.

Half Past Human .com

Absolutely the capability!

Discretion and interests match? Dunno.

00c874 No.532931


Is there an interest in pre-selecting data?

For example, select only posts identified on "notable posts" lists from each general #.

Plus, of course, any to-from links on those selected, chained.

Just asking. DB size, usability, etc.

Or is the data set also for researching shill/troll themes? It is a possibility, so I ask.

07564d No.534887


I'm working on that right now. I got started on this a week or so ago. I wrote a bit of code to travel back through context links, too. Hopefully, in a few days, I'll be able to repost my blog with the results of this work.

07564d No.534908


A bit more to say about that:

It's my plan to include items that reach back to a Q post together with that Q post when I can identify such. I may do a little pruning to keep the length of the entry associated with a Q post under control. Not everything in a context thread is important, after all. I may have to think about further arranging of things. I'll think more about that as I get closer to a point where I can implement such a strategy.

dbb4a4 No.536855

File: 1aca03a8df398a5⋯.png (92.13 KB, 1241x968, 1241:968, ClipboardImage.png)



So I managed to find the missing drops. My archive now has 827 total. As it turns out, the scraper was working as designed, filtering out Anonymous posts. The missing 2 for me were #823 and #819 when Q's trip wasn't working.

8143cc No.538741


>Half Past Human .com

Wow. That's a new one to me.

8143cc No.538775


>There are over 750,000 total posts from both sites and all boards containing Q related posts.

Yes, and that's the challenge. Making the Q "related" post searchable. Making Q's posts searchable is arguably not as important as making the body of related posts searchable as that's where the body of knowledge resides.

"You have more than you know" taunts us with its promise. We get pointed to Loop Capital, or Stanislav Lunev. We need to be able to search/aggregate all of the posts over weeks/months with a single search. The dedicated research threads are great as far as they go but we're missing a lot of other info posted as snippets.

8143cc No.538787


>few of the images uploaded into the threads are obscene.

That does complicate it, but a lot of the information in the Q "related" posts is graphic. It seems culling of obscene content would need to be done manually to avoid throwing the baby out with the bathwater.

98bd4e No.540555


Good catch. I found some in my db as well.

I like the post headers in the UI. Nice and clean.

838074 No.541964


Yeah, qanonmap has had all of those for over a week now…

dbb4a4 No.543389


What is everybody using as their sources for drops? 8ch? One of the QCode forks? Something else?

How do we verify that our collections are the same?

I've been adding a Guid for each post I scrape, just to give them all a unique value.

98bd4e No.545176


qtmerge uses the raw JSON/HTML data where relevant from 8ch, 4plebs and trumptwitterarchive as it's source data. It also merges in the JSON from qcodefag/qanonmap. It currently uses the host, board, post timestamp and post number to sync.

I like the idea of matching the GUIDs along with a post hash using some method we agree on.

dbb4a4 No.547789


Oh shit. Qtmerge is scraping HTML pages? You are dedicated. I sourced stuff from qcodefag that I couldn't get json for.

Do you have the full bread sources?

dbb4a4 No.547826

Phonefag right now.



There's an md5 field as you know in the 8ch json, but it wasn't in the data I got from Qcodefag. Because he'd modified the .com to strip HTML into a.text field.

My chanscraper keeps the md5 and the .com and strips HTML into .text.

Any C#fags here?

I did set up a GitHub yesterday and push the chanscraper out. Gonna get the Twitter stuff mashed in the next few days.

dbb4a4 No.548084


Just ran my chanscraper again since apparently there were new posts last night as I was jacking around with Github.

I checked my posts with what's on qresearch and I think I'm good. Showing 839 total now.

New Q posts from 828 - 839.

I found a bug in the ChanScraper code too. A thing I've been working on that I forgot to remove. I'll push it out too and then link the GitHub.

dbb4a4 No.548229


Here's the link to my new GitHub

https:// github.com/QCodeFagNet/SFW.ChanScraper

If you are going to run the ChanScraper and then view the posts locally, when you open the QJSONViewer.html page, don't open the [json\_allQPosts.json] file, open the newly generated [bin\json\_allQPosts.json] file.

The machine needed me to include all the existing posts/work json. It's kind of clunky the way I'm doing it because I want to keep this updated with the latest posts/work json. But for a normal user everything is kept updated automagically in the bin\json folders. The project is set up to copy new files if newer - so everything should be kept in sync.

If you are planning on running this locally you'll need the .NET framework 4.5 at least. Probably better to go with 4.5.2

https:// www.microsoft.com/net/download/dotnet-framework-runtime/net452

dbb4a4 No.548433


You'll need Visual Studio free (at least) to build it unless you are a commandline master.

https:// www.visualstudio.com/vs/visual-studio-express/

98bd4e No.549377


Only HTML of archive pages.

07564d No.549586


Does your scraper work on the archive.is versions? These are the most complete most of the time since that is where so many of the pages were almost immediately saved by anons.

dbb4a4 No.550148


Tedious Dayum. Think you could convert your full bread scrape into some json?


Gotta link to one of the JSON files?


Here's a mini local JSON viewer as an HTML page + allQPosts.json. @225KB

Includes all QPosts up to 2018-03-04T11:29:14

https:// anonfile.com/06HeJbdeb6/Mini_Local_JSONViewer.zip

I was just thinking that what we really need, to start off with is a single schema that we can all agree on. It will go a far way in interoperability.

I'm going to run some tests on my local QCodeFag install and see if it will work off of the ChanScraper _allQPosts.json file. I think it should.

The JSONViewer could work with straight files from 8ch or 4ch with a single minor change I forgot to put in.

dbb4a4 No.550167


The ChanScraper includes the full JSON archive as of this morning. I haven't need to go back to any archive.is HTML archives because I've been collecting breads locally since the beginning of Feb. All the Q Posts before that I sourced from the QCodeFag forks.

dbb4a4 No.550218


Here's what the JSON schema I'm working with looks like.



"source": "qresearch",

"threadId": 544266,

"link": "https:// 8ch.net/qresearch/res/544266.html#544985",

"imageLinks": [


"url": "https:// media.8ch.net/file_store/ffd6128f5949e4d4f6f3480236a63be002ffc5e59c0a31714360624d8ce45170.jpeg"



"url": "https:// media.8ch.net/file_store/ffd6128f5949e4d4f6f3480236a63be002ffc5e59c0a31714360624d8ce45170.jpeg/B42CA278-6C32-4618-A856-0CB9B680CC38.jpeg"



"references": [


"source": "qresearch",

"threadId": 0,

"link": "https:// 8ch.net/qresearch/res/0.html#548166",

"imageLinks": [],

"references": [],

"no": 548166,

"uniqueId": "19294a1b-8cae-435d-9503-8eb70c573d6b",

"_unixEpoch": "1970-01-01T00:00:00Z",

"text": "\r\r>>548157\r\rAlso not a real Q post\r\rQ",

"postDate": "2018-03-04T11:19:47",

"time": 1520180387,

"tn_h": 0,

"tn_w": 0,

"h": 0,

"w": 0,

"tim": null,

"fsize": 0,

"filename": null,

"ext": null,

"md5": null,

"last_modified": 1520180387,

"sub": null,

"com": "<p class=\"body-line ltr \"><a onclick=\"highlightReply('548157', event);\" href=\"/qresearch/res/547414.html#548157\">&gt;&gt;548157</a></p><p class=\"body-line ltr \">Also not a real Q post</p><p class=\"body-line ltr \">Q</p>",

"name": "Q ",

"trip": "!UW.yye1fxo",

"replies": 0



"no": 544985,

"uniqueId": "35c759aa-4998-4009-83a7-2af1b3273f28",

"_unixEpoch": "1970-01-01T00:00:00Z",

"text": "\r\r>>548166\r\rNOT A REAL Q POST\r\rQ",

"postDate": "2018-03-04T00:17:27",

"time": 1520140647,

"tn_h": 237,

"tn_w": 255,

"h": 1114,

"w": 1200,

"tim": "ffd6128f5949e4d4f6f3480236a63be002ffc5e59c0a31714360624d8ce45170",

"fsize": 271479,

"filename": "B42CA278-6C32-4618-A856-0CB9B680CC38",

"ext": ".jpeg",

"md5": "CbsCGk0pVEahunzSuV4LKw==",

"last_modified": 1520140647,

"sub": null,

"com": "<p class=\"body-line ltr \"><a onclick=\"highlightReply('548166', event);\" href=\"/qresearch/res/547414.html#548166\">&gt;&gt;548166</a></p><p class=\"body-line ltr \">NOT A REAL Q POST.</p><p class=\"body-line ltr \">Q</p>",

"name": "Q ",

"trip": "!UW.yye1fxo",

"replies": 0



98bd4e No.550251


Let me clarify, HTML for just the archive pages (to capture threads not in catalog/threads.json). JSON for everything in else.

I'm working on how to share it, currently unoptimized and around 6 GiB of data uncompressed.

07564d No.551411


http:// archive.is/https:// 8ch.net/cbts/res/*

It doesn't look like archive.is does JSON. Your parser doesn't do HTML?

dbb4a4 No.553092


Yeah I've dug thru all the html looking for a reference to a json file. Can't find a reference to one either. My guess is, that once it drops off the main thread catalog, the JSON is no longer available. Too bad because that's the meat in a simple format.

No the machine is more of a scraper (grab data and save it) than a parser. It does parse the HTML out of the .com field into .text like QCodeFag does though. It's not designed to read thru html pages to look for posts.

It has a local baseline archive of everything.It reads in that entire local and then figures out the json breads it needs to download from the 8ch/qresearch/catalog.json. Then it downloads all those new breads and resets itself so you don't download everything every time - only the breads from the past [x] days.

dbb4a4 No.553109


You've got a database? I assume that's with all the images as blobs?

dbb4a4 No.554074

Here's an updated mini local JSON viewer as an HTML page + allQPosts.json. @225KB

I updated it so it works with the raw json from 8ch.

https:// 8ch.net/qresearch/res/553655.json

Could probably use an [ascending/descending] button but…

Includes all QPosts up to 2018-03-04T11:29:14

https:// anonfile.com/z4U1Jdd9b9/Mini_Local_JSONViewer.zip

If folks don't like a zip, it's only 2 files they can download the HTML file (ChanScraper) and the allQPosts.json (Console\bin) file on github https:// github.com/QCodeFagNet/SFW.ChanScraper

07564d No.554309


My images are kept as separate files in original form. Only the links are kept in the database. Here's the record definition for MySQL:

CREATE TABLE `chan_posts` (

`post_key` varchar(31) NOT NULL COMMENT 'site/board#post (post is set to length 9 with . fill.',

`thread_key` varchar(31) NOT NULL COMMENT 'site/board#thread (thread is set to length 9 with . fill.',

`post_site` varchar(19) NOT NULL COMMENT 'For editor post, use editor. For spreadsheet, use sheet.',

`post_board` varchar(15) NOT NULL COMMENT 'For editor post, use editor. For spreadsheet, use sheet.',

`post_thread_id` int(10) UNSIGNED NOT NULL COMMENT 'For editor post, use 1. For spreadsheet, use row.',

`post_id` int(10) UNSIGNED NOT NULL COMMENT 'For editor post, use next available. For spreadsheet, use column converted to number.',

`ghost` int(10) UNSIGNED DEFAULT NULL,

`post_url` text,

`local_thread_file` text,


`post_title` text CHARACTER SET utf8 COLLATE utf8_unicode_ci,

`post_thread_title` text CHARACTER SET utf8 COLLATE utf8_unicode_ci,

`post_text` text CHARACTER SET utf8 COLLATE utf8_unicode_ci,

`prev_post_key` varchar(31) DEFAULT NULL,

`next_post_key` varchar(31) DEFAULT NULL,

`wp_post_id` int(11) UNSIGNED DEFAULT NULL,

`post_type` set('editor','q-post','anon','approved','high','mid','low','irrelevant','timeline') NOT NULL DEFAULT 'anon',

`flag_use_in_blog` tinyint(1) NOT NULL DEFAULT '0',

`flag_included_on_maps` tinyint(1) NOT NULL DEFAULT '0',

`flag_included_in_bread` tinyint(1) DEFAULT NULL,

`flag_bread_post` tinyint(1) DEFAULT NULL,

`flag_relevant_img` tinyint(1) DEFAULT NULL,

`flag_relevant_post` tinyint(1) DEFAULT NULL,

`author_name` text,

`author_trip` text,

`author_hash` text,

`author_type` smallint(6) DEFAULT NULL,

`img_files` json DEFAULT NULL,

`link_list` json DEFAULT NULL,

`video_list` json DEFAULT NULL,

`editor_notes` text,

`tags` text,

`people` text,

`places` text,

`organizations` text,

`signatures` text,

`event_date` datetime DEFAULT NULL,

`report_date` datetime DEFAULT NULL,

`timeline_title` tinytext


ALTER TABLE `chan_posts`

ADD PRIMARY KEY (`post_key`),

ADD KEY `post_id` (`post_id`),

ADD KEY `thread_key` (`thread_key`),

ADD KEY `site_board` (`post_site`,`post_board`);

I'm considering making the database publicly available. I need to figure out how much space it will take up and whether it will fit within my current hosting plan. At present, I have over 880,000 posts in the database. The size of the database file for just this table without the images is 1.1GB. There's another GB for images of Q posts, but this is only the fraction that is Q posts, bread posts, and for the context posts related to these.

07564d No.554376


I guess I should start uploading. I've got the unlimited plan. Anyone want to write the search feature for it? Preferred language is PHP.

98bd4e No.555095


For now it just uses a dedicated file system.

With images gathered so far this mirror's total size is 193 GiB.

dbb4a4 No.560076


holey phuck. 193 GB. That's for a full archive of all breads + images? My local scrape of Q breads and posts as text only comes in at 6mb. My local QCodeFag install with text + Q images is just under 100mb.

193GB is getting unmanageable.

98bd4e No.560415


Yes, unoptimized and incomplete.

07564d No.564762


Not unmanageable. Just big. Maybe every thread needs its own directory for its images. And maybe the data needs to be moved to my other drive locally.

07564d No.564862

I'm working on the export files now. I need to change the posts just a bit before I can make them public.

I promised that no links would go to 8ch and particularly qresearch, and also that I would redact mentions of them from the content. I already do this on my blog, but I simply broke the links rather than made them go somewhere else. To get the most out of the republishing of the posts, I need to convert the >> and >>> links so that they link to posts stored on my own site. This is probably better anyway since many posts and threads are now missing from their original locations.

dbb4a4 No.568187


Yeah it's not totally unmanageable. It's more like moving a full grown oak tree. You can do it, but it's a huge pain in the ass. I was thinking more in terms of moving it around the internet or hosting. That's a pretty big db.

I rejiggered the ChanScraper to archive all the breads even if there isn't a Q post in that bread. It rendered 215 NEW complete breads and brought my jason net filesize from 6MB to 200MB. Starts around "Q Research General #358".

That's with no images, just the raw JSON from 8ch. Each bread is around 700kb.

98bd4e No.568666


I did some research on collecting the CBTS threads from 4chan/pol the other night and the results might be useful for others. They can be found at the bottom of the page here:

https:// anonsw.github.io/qtmerge/catalog.html

It's still a work in progress.

dbb4a4 No.568861



I may be able to give you an list of all those links from the data I have from QCodeFag

dbb4a4 No.569061



nevermind looks like you got it covered. nice!

07564d No.569170


Yes, the breads are essential. I've got them going back all the way through 4chan stuff. The breads are how you connect in the answers. If you connect up the contexts, most of them link back to a Q post at some point. Then the context of that post that was linked into the bread can be associated with the Q post. That is what I was working on before I started looking at making my entire database available for research.

98bd4e No.569329


Were you able to capture any of the original 4chan JSON/HTML data? I wasn't researching Q at that time so I've relied on 4plebs.

d6b0f8 No.569596


I have created a searchable application for /qresearch/.

The database is filling right now. I kept only the image attachments in order to save hard disk space.

At present 52,000 of the most recent posts on qresearch are loaded in the table with the attachments. We'll see how the storage works out.

I'll advise when anons can attempt to use the system.

07564d No.569793


I've got most of it, yes.

07564d No.569900


I don't know if y'all noticed, but I've got several columns in my database that are not part of the original data. Some of these are tagging fields: `tags`, `people`, `places`, `organizations`, and `signatures`. It would be difficult to automate the filling of these fields, but I don't want to entirely open up editing of these fields to anons, either, due to the potential of clown interference. There's no way I can fill all of them in myself. I have an idea to allow tags to be suggested and then allow up-voting and down-voting and coming up with an acceptance criteria before giving them a permanent place in the data record. Or maybe just leave them in that form with their ratings.

98bd4e No.570074


Excellent. Will that raw JSON data be in the DB as well?


I did notice, those are great ideas. Can I suggest letting each user have their own copy/edits of the metadata? The user-specific data could then feedback into the system for suggestions to others, etc. But primarily it gives the user some way to control the interference/noise.

dbb4a4 No.570566


What JSON are you looking for anon? Bread before 2/6/2018?

dbb4a4 No.570604

I've rejiggered the ChanScraper to produce TwitterSmashed json. It includes any DJTweets within 60 mins of a Qpost. Here's what a [5], [8], [10] deltas look like.


"DJTtwitterPosts": [


"accountId": "realDonaldTrump",

"accountName": "Donald J. Trump",

"tweetId": 944665687292817415,

"text": "How can FBI Deputy Director Andrew McCabe, the man in charge, along with leakin’ James Comey, of the Phony Hillary Clinton investigation (including her 33,000 illegally deleted emails) be given $700,000 for wife’s campaign by Clinton Puppets during investigation?",

"delta": 5,

"link": "https:// twitter.com/realDonaldTrump/status/944665687292817415",

"uniqueId": "00e6951d-5f49-455b-bdd9-bda7f184d9c7",

"time": 1514060825,

"_unixEpoch": "1970-01-01T00:00:00Z",

"postDate": "2017-12-23T15:27:05"



"accountId": "realDonaldTrump",

"accountName": "Donald J. Trump",

"tweetId": 944666448185692166,

"text": "FBI Deputy Director Andrew McCabe is racing the clock to retire with full benefits. 90 days to go?!!!",

"delta": 8,

"link": "https:// twitter.com/realDonaldTrump/status/944666448185692166",

"uniqueId": "92fbb1a2-169e-412c-abba-6e441d3acbaa",

"time": 1514061006,

"_unixEpoch": "1970-01-01T00:00:00Z",

"postDate": "2017-12-23T15:30:06"



"accountId": "realDonaldTrump",

"accountName": "Donald J. Trump",

"tweetId": 944667102312566784,

"text": "Wow, “FBI lawyer James Baker reassigned,” according to @FoxNews.",

"delta": 10,

"link": "https:// twitter.com/realDonaldTrump/status/944667102312566784",

"uniqueId": "eabb202f-3b59-48c9-b282-f0110b8388a5",

"time": 1514061162,

"_unixEpoch": "1970-01-01T00:00:00Z",

"postDate": "2017-12-23T15:32:42"



"no": 158078,

"name": "Q",

"trip": "!UW.yye1fxo",

"sub": null,

"com": null,

"text": "SEARCH crumbs: [#2]\nWho is #2?\nNo deals.\nQ\n",

"tim": null,

"fsize": 0,

"filename": null,

"ext": null,

"tn_h": 0,

"tn_w": 0,

"h": 0,

"w": 0,

"replies": 0,

"md5": null,

"last_modified": 0,

"source": "8chan_cbts",

"threadId": 157461,

"link": "https:// 8ch.net/cbts/res/157461.html#158078",

"imageLinks": [],

"references": [],

"uniqueId": "e22306cc-2831-453a-ae1d-16e90aa23707",

"time": 1514060541,

"_unixEpoch": "1970-01-01T00:00:00Z",

"postDate": "2017-12-23T15:22:21"


98bd4e No.570634


4chan JSON for pol between 2017-10-30 and 2017-12-01.

dbb4a4 No.570660


I'll keep my eyes peeled. Finding old JSON for those days is hard. Is 12-1 when you started archiving? Got bread json < 2-6-2018?

07564d No.570766


I could develop an export, I suppose. But that's low on my list of priorities at the moment. The data structure is above in the list. Minor alteration needed: My host does not support JSON fields. Substitute TEXT, and you should be good. If you want to write an exporter, I can review it and include it.

But I still don't have the data up there yet. I'm working on the alterations to the data needed to keep everything on site at the host.

07564d No.570809


I was thinking of attaching the IP address to each suggestion to keep the up-votes and down-votes honest. Is that enough? Or maybe even too much? The other thing I could do is perhaps tie in the WordPress login system, since it's there anyway. It might take a bit of time for me to figure out how to limit permissions.

98bd4e No.570874


Thanks, 4plebs is good for now, but a second witness is preferable. Started archiving Feb 15, but some old data was still available at the time.

For 8ch these are the oldest breads I have:

pol: 10509790 (2017-08-28)

cbts: 10 (2017-11-21)

thestorm: 1 (2018-01-31)

I don't have all breads after though, it is incomplete.

I've since stopped archiving pol/cbts/thestorm to save time/space.

98bd4e No.570944


Not enough due to VPNs, DHCP, etc. The login may be the best way.

dbb4a4 No.570983


I think you and I started archiving those about the same time. I've got complete json breads from 2/6/2018 to now. if you want any of that.

98bd4e No.571020


I might already have it, is it in the QJsonArchive.zip from earlier?

dbb4a4 No.571161


Ya - you probably have the breads from the last few days eh?

98bd4e No.571229


I do, I'll call your dataset QCodeFagNet unless you want a different name. Instead of the zip I'll pull it from your github.

dbb4a4 No.571310


Sounds fine. I'll try to keep it updated.

07564d No.575021


Logins require email addresses. I guess it's always a choice whether to participate.

d6b0f8 No.583035


Q Research General - searchable archive breads 716-477 presently online.


username qanon

password qanon

updates as I find them

70e498 No.596604


Looking good

6a9543 No.598094

File: df457dd3420fb52⋯.jpg (101.52 KB, 500x522, 250:261, 1487336933873.jpg)

There so much content being produced now that it should be compiled into a wiki in a dedicated thread. The other threads investigate and make the content, this one adds the best content into one big archive, updated in real-time ofc bc they never stop why should we pic related.


To take Q's work to the next level we have to increase the public's basic awareness of the criminality being exposed, investigated, and terminated, by an order of magnitude. That order of magnitude is pretty normal people.

>be a normal person

>want to do the right thing but get a link to this Q thing and there's too much complex and """scary""" info what with muh job and family and everything else

>the big load of content is overwhelming and i don't know where to begin and have it be easy

<make 1 entry point to begin browsing the entire body of accepted content

<terse organization keeps it brief and saves the details for a leaf page a click away, as deep as is necessary

<keep source of body of accepted content continuously up to date

<using https for minimal integrity protection

>now i can begin a review of the evidence contained in the case file archive with a single click! jeff bozos eat your heart out nigger

>and look at short well-organized and sourced text, and pictures, and the odd video

>and easily get a run down on whatever topics i browse my way upon

>and now even though my eyes have been opened in a pretty dramatic way, it was easy to use and i know it'll be easy to share, to the topic level

70e498 No.602595


I hear you anon.

The key is the content. We have the ability archive threads/qposts. Posts that Q references. Tweets. Known tripcodes/twitter accounts.

What is the source of all the evidence? The dedicated research threads? Notables? In order for it to be automagic, there needs to be a reliable single source here on 8ch. None of the codefag work I've seen reaches a level of what could be called AI - or the ability to discern which anon has posted a certifiable answer/evidence.

Non automated means anonomated, but that causes it's own set of issues.

I agree a wikipedia style thing would be good because it's familiar, but populating it with data may be an issue. Some of it's going to have to be entered in manually.

If all you are looking for is a location for an anon wiki, I think that's pretty easy.

6a9543 No.603402


No, not automated, curated.

98bd4e No.603568


Should I hit _allQPosts.json?

07564d No.605608

I'm stuck. I'm working on getting that database up for you, but I have to make some modifications to the `post_text` field so that those links don't come here to 8ch. (I promised that I wouldn't do that.) I'm trying to fix the `post_text` field so that the >> links refer back into the database, but I'm not familiar enough with the DOMDocument and related classes in PHP. Are there any good tutorials out there on how to do advanced manipulation of HTML using these classes? The reference manual stuff just isn't doing it for me.

07564d No.605926


I should clarify something. Not only am I going to make the existing links self-reference, but I'm also going to revive those dead >> links and point them back into the database. I've got many of the deleted threads in my database, too, and I can make those available.

70e498 No.612945


Ya that's fine. I'm going to update that today to cover the latest.

I've been working on a new local viewer that uses the twitter smashed data. It shows the delta + alt text of the tweet + a link to the tweet. I've noticed that alot of the image links I have a currently broken. I was thinking I'd just update those to point to one of the other QCodeFag branch archives rather than try and archive all the images as well.

Expect an update on GitHub later

70e498 No.613236

File: f9d167645faf34b⋯.png (129.03 KB, 830x723, 830:723, ClipboardImage.png)


Here's what it looks like. Just trying to finish off a sort idea and clean data.

07564d No.613641

Good news! I've got the code working which makes the post links compliant and refer back into the database. Almost as soon as I posted the request, it came to me that I was making things more complicated than they needed to be and a better algorithm came to mind. The algorithm is so good that in cases where good posts didn't link in 8ch, they will be linked on my site. That includes links such as the one Q pasted into the middle of a word the other day or when they are consecutive with or without comma or white space. Anywhere there is a >> followed by a bunch of digits, a link should be created. The only exception is where the post number of the link is greater than the post number of the current post. This type of error was encountered in early posts after the transition from one board to another. Anyway, I'm going to run a few more quick tests, and then I should be uploading to my host within a few hours. I still don't have code ready to search it, though.

70e498 No.613892


When you get that worked out make sure to let us know. I've been wondering about that myself. The early halfchan no's are pretty big. I've found some bugs in my code around there being multiple references per Q post. It does happen on occasion and my scraper isn't catching them all.

I've just uploaded a bunch of json data to the https:// github.com/QCodeFagNet/SFW.ChanScraper/tree/master/JSON gihub. The json folder is what's generated when you run the ChanScraper, the smash folder when you run the TwitterSmash. Each of those folders has a Viewer.html file that can be used with just the _allQPosts.json or _allSmashPosts.json.

Like I said I need to clean up some dead image links for everything to be working right.

07564d No.618146


You MIGHT be able to get thumbnails from archives, but you won't get full size images there, for the most part.

70e498 No.620330


Ya think it's bad form to go lazy and link em to one of the qcodefag archives?

07564d No.622768


Part of making those offline archives is storing the items. Plus, don't assume any platform is forever. There are too many clowns out there who don't want anyone to see this stuff.

So now I've got a bunch of export files of my database ready to upload. Next challenge: Automating the import on the hose.

07564d No.622903


>import on the hose.

Do clowns alter typing?

07564d No.625024

The table of posts has been added to the database. It's all up there. (All I have, anyway.) I need to get a way to make searches available to you now.

70e498 No.632885


So you have all the breads searchable as well?

07564d No.648528


Everything is searchable. The database includes all posts I could find. I'm working on the search front end right now.

07564d No.648594

File: 6b4b674ff054cee⋯.png (82.26 KB, 1231x1217, 1231:1217, Q-Research-Tool.png)


This is what the front end looks like right now. I'm working now on turning that into a SQL statement that can search the database. I'm only an hour or two from putting this online.

07564d No.649300


It's up there. The paging isn't working yet, so don't anyone complain about that. I'll fix it in the morning. I also discovered that a key range of posts didn't import properly. I'll fix that in the morning, too. For now, I've set the posts per page to 2000, which may cause timeouts, but it will allow people to play with things a bit.

http:// q-questions.info/research-tool.php

cc8139 No.649479


ANON, great work.

70e498 No.650810



This crosses all breads? If so then this is exactly what we need. I can help you with the SQL if you need it.

SELECT * FROM tbl LIMIT 5,10; # Retrieve rows 6-15 you should also specify an ORDER BY


How are you getting the breads? Maybe I can work out a way to get you those. Combine up somehow

70e498 No.651528


I've been thinking about this. Preliminary research shows that elasticsearch and lucene would probably be the best match for what we've got. There are alot of tools that pile into elasticsearch. Any hostfags here with the ability to set up an elasticsearch node?

The data is big. Tons of images. A proper archive takes space. I'm holding @546 complete breads and with no images it's 250MB+. That's for like a month. By the end of the year the bread collection alone is going to be over 1.5GB.

The images I've got so far is around 100MB, but that's just from the Q posts - and even then I know I'm missing some.

Econ Godaddy hosting is like $45 a year. I'm thinking about just putting the chanscraper/twittersmash online, then write some simple apis. Get thread#, filteredThread, qpost# that kind of thing. Useful or no?

07564d No.652644


My algorithm for getting breads is this:

1. Get the author_hash for the first post in a thread.

2. Mark the first posts in the thread that match that author_hash until the author hash doesn't match.

If someone jumps in before the baker is done, oh well. But that shouldn't be much of a problem because the breads get repeated a lot. I can mark posts as bread later, if need be.

70e498 No.654567


Hmm… When I say bread I mean a full Q Research thread. Like this

https:// github.com/QCodeFagNet/SFW.ChanScraper/blob/master/JSON/json/8ch/archive/651280_archive.json

That's the straight bread/thread from 8ch. It includes all the responses whether the BV posted it or not.

I'm finding those by getting the full catalog from

https:// 8ch.net/qresearch/catalog.json, finding the breads/threads that have q research, q general etc in them, and then getting the json for that thread only from https:// 8ch.net/qresearch/res/651280.json

I think I see what you are doing - going thru and trying to mark the relevant posts?

07564d No.654852


I haven't even looked at at that.

Paging is fixed, plus I gave you a couple other search parameters.

I'm still working on the import issue, but I at least have put the posts I initially identified as missing up there.

07564d No.654901


>I think I see what you are doing - going thru and trying to mark the relevant posts?

Yes. Most of it is done automatically. Since I save the marks in the post records, I can go back in there and adjust it, if necessary.

5f9a22 No.663255


>Useful or no?

I'm not the guy to ask. The discussions here went over my head immediately. Looks like there's some serious progress being made here:



One question I have for contributors here is when there is a consensus that you have created a viable search tool, how will you manage promulgation? Do it like a war room announcement on qresearch?

As many have noted, the search tool has to be hardened against tampering before release. Clowns/shills are devious and destructive.

70e498 No.664801


I agree on shill proofing.

I've been playing around with a webAPI. I've got it working nice with all the q posts, looking for a specific post# like #929, and posts on a day. Returns json or xml. This is the Crumb Archive.

My plan is to expand that so that the archived breads can be accessed as well - each as a single json file. This is the Bread Archive.

I'm going to set it up where it's an autonomous machine. It will scrape and archive automagically moving forward from the current baseline. No delete. No put. No fuckery.

I'm pretty sure it would with the QCodeFag scraper repos.

The bread archive is pretty big. I'm sure there's no way I can archive images for all the breads. An image archive isn't what I've been focused on. The focus of this is only making the json/xml available from the chanscraper.

Once I can get the breads all up and being served automagically my plan is to set up an elasticsearch node and suck all the breads in.

I figure a year of godaddy hosting is currently $12 with unmetered bandwidth. I'll throw in.

07564d No.664904


Yes, I'm concerned about that, too.

Perhaps it helps that this data does not reside only there?

In this case, it would take me about half a day to get it all up there again, if need be.

d6b0f8 No.664928


Searchable Qresearch


username: qanon

password: qanon

Updated regularly with the messages and images from Qresearch general.

07564d No.664960

I'm beginning to wonder if I'm up against some kind of limit on my remote host. I just tried importing into it again, and I'm still missing some posts.

Remote host: 1,010,127 records

Local machine: 1,049,610 records

d6b0f8 No.664975

I'm using the 8chan JSON API endpoints. I still need to pull from the archive.json file downloaded yesterday.

My server is on a linode so I have fast response time.

07564d No.664984


Maybe I can split the table into 4chan and NewChan (my name for 8ch, since we can't link back to here) and see if they all go up.

d6b0f8 No.664990

You can search the text is the posts with wildcards. Say you want all posts with the word BOOM. Just enter *boom*.

Say you want the posts from Q with his tripcode and "boom"

Put !UW.yye1fxo in the trip code.

put *boom* in the comment

Click search button


d6b0f8 No.664998

Has anyone found a way to go back past the 25 pages in the console.json?

d6b0f8 No.665010


Can I access this? I'd like to add the DJT tweets into the database. Twitter is wanting more and more data before they give me an API key.

d6b0f8 No.665018

File: c73b9e4f4cc2fd3⋯.png (247.29 KB, 1944x1226, 972:613, ClipboardImage.png)

d6b0f8 No.665029

File: f1fe9abb9325de7⋯.png (297.77 KB, 1941x1224, 647:408, ClipboardImage.png)

d6b0f8 No.665054

File: 04c25cb69211184⋯.png (16.81 KB, 480x165, 32:11, ClipboardImage.png)

U.POSTS.NEW is the new-format table.

U.POSTS.NEW.ATT is the table of attachment for the primary table. Each one is a link to a binary

5f9a22 No.666471


Wow, awesome job! I knew it could be done. I'm going to need some help getting started. Could you put a qsearch for dummies tutorial together?

Did you have to create, or did this create a chronological list of all Q related threads and their titles if any? (/pol/cbts/CBTS(8ch)/The Storm/ qresearch)?

That might be a good Mnemonic to speed searches.

ee0f91 No.666874

how about:

-archive threads as they go

-convert to text files, with links to posts

-txt files are easily searchable

d6b0f8 No.666959


I've not been back into this thread for a while. I'm running the qresearch import process to get up-to-date. One technique that is needed is to re-scan already imported threads for posts missed during initial scans.

Threads are imported from the catalog.json file. In this state, we know the thread number and the number of messages at that time. The only time we know a thread is closed is when the number of posts >= the number in the official "bake" count.

Therefore, my program keeps testing until the posts counter >= the bake counter and then marks the thread as complete in the thread table. This then prevents re-scanning all threads because we get only the open ones.

Multiple scans of posts are needed to get all of them and to deal with duplicate threads.

I use the 8-chan post number as part of the primary key to the threads and posts tables.

8GA_1 is 8chan Great Awakening post 1

8QR_655000 is 8chan Qresearch post 655000

The big problem is going back to find threads BEFORE the last 25 pages in the catalog.json. Therefore, I can't get anything earlier than when I first wrote the import.

d6b0f8 No.666983

The import routine uses the JSON API endpoint from the boards. In the JSON is the Unix timestamp of the message. This is a native field/object type in Pavuk. Thus all timestamps are set to UTC internally.

NOW, if I could get DJT's Twitter feed in JSON, it also has UnixTime and this goes in directly.

Twitter wants me to give them all sorts of documentation before they will allow me to use their API. Frankly, I don't have the time to deal with them or the inclination.

d6b0f8 No.666995

I can get other boards provided the endpoints are similar and that the catalog.json file still has links to the threads.

BO has never responded to my requests on how to get older threads.

d6b0f8 No.667022

Super simple.

Entry forms are also search forms.

Enter the data that you wish to match.

Click the search button.

Pavuk creates and then executes the appropriate query and returns the items in a Kendo grid. Scroll, resort, export to excel or click on a row to return to the entry form with your data.

*searching on timestamps has issues that i need to resolve*

d6b0f8 No.667026

I'm done for the day.

d6b0f8 No.667043

File: bc982c3469bafd6⋯.png (109.82 KB, 781x448, 781:448, ClipboardImage.png)

d6b0f8 No.667053

File: f6665239b502443⋯.png (394.1 KB, 1393x926, 1393:926, ClipboardImage.png)

d6b0f8 No.667075

The comments from the JSON API include markup and JS to go to real links. This is a problem with the storage and search. I pipe the comment string through Lynx with the -dump option and this gives me clean text in STDOUT and then a separator and then the list of actual links. I put the text in the comments and the links in a multivalue table. I'll expose the links tomorrow as a separate tab in the entry form.

70e498 No.667648


What about 100k transactional batches?

5f9a22 No.667776


Jesus Einstein, give us a starting point to keep up.

70e498 No.667886


Yeah man hit it. I've got a github here you can browse around.

https:// github.com/QCodeFagNet/SFW.ChanScraper/tree/master/JSON

json/8ch has the filtered/unfiltered bread and archives in it. smash has the twittersmashed posts. I've been getting my twitter data from http:// www.trumptwitterarchive.com/data/realdonaldtrump/2017.json, 2018.json

I set up a test for the webAPI twittersmashed posts here https:// qcodefagnet.github.io/SmashViewer/index.html

I'm getting close on having the webAPI thing finished up. Just running some more tests and then I should be ready to go.

70e498 No.667927


Yeah you could mebbe use the smashed json from me. I've already done the unix timestamp on the trump tweets. All 8ch posts and Twitter posts dervive from the same Post base object with the unix timestamp built in.

70e498 No.667971


I think that's because you can't really get them. There is an 8ch beta archive here, but all the Q Research threads dissappeared shortly after we started archiving them. Even then, those archives are straight HTML. It's of no use to me. AFAIK, once it slides off the main catalog, its pretty much gone. Some trial and error got me a few breads, but not many.

5f9a22 No.668375


>BO has never responded

I'm not the board owner, just some schmuck who started a thread he thought was being overlooked. You folks are so far out of my ballpark all I can do is try to keep it inside the curbs of what my original intent was.

I'd like to see a list/catalogue/file of all Q "related" posts.

Aaand I'd like to see a list of post Q "related" posts across all platforms/threads made searchable. Plenty of focus on Q, we need the early digging and free association.

70e498 No.668958


Interesting concept you have anon. You want to be able to search across ALL 8ch? Not just Q Research? By platforms are you talking 4ch/8ch? or 4ch/8ch/twitter/reddit/facebook…?

07564d No.670142


The first time I uploaded, I batched them in by 1000.

The second time, I batched them in by thread. I'm not sure how well the LIMIT clause on the SQL works.

In any case, I may have a problem on both computers. I could have sworn I had over 1.1 million records the other night. (Not to worry. I still have all of the source.) The solution may be to partition the table. I won't have to rewrite any code, but it'll chunk the table's file down into smaller sections.

This should be interesting. I've never had to partition a table before. Apparently, newer versions of MySQL do it automatically. But until then, it's gotta be done.

07564d No.670158


Mine has 4chan, too.

07564d No.670221


If threads are missing, you have to look in archive.org/web or archive.is. Of the two, archive.org/web is better for scraping because the HTML code is about as close to the original as they can make it. I can actually use the same scraper program on it.

Since the stuff that is on archive.is is so different from the original, I will need to write a new scraper for those. On several occasions, the post was important enough that I rebuilt it by hand.

With either archive, you need to know the URL, which can be tricky sometimes. Just having the post number won't do it. You must know the thread as well.

Just thought of something: When I get threads from these archive sites, what time zone do they show? I believe my stuff is saving to GMT when I save a post directly from a chan site. I'm not sure what I'm saving when I get posts from these archives.

70e498 No.670279


I would think the time is relative to the archive home timezone. That is, unless archive.x has done some wizardry to change the time zone it's pulling at to be the time zone of the user requesting the original archive. That would be more problematic - but you could still deal. It should be marked what time zone and then you convert into the unix timestamp.

70e498 No.670289


The 4ch breads or the 4ch Q posts?

70e498 No.670304


What are the chances it's hanging on a specific record? I see that all the time doing inserts. Bad data kills it off.

70e498 No.670317


You could look into raising the timeout. Mebbe it's just such a long job that it's taking too long and timing out? https:// support.rackspace.com/how-to/how-to-change-the-mysql-timeout-on-a-server/

07564d No.670322


Here's a hint for how to find the post a dead thread belongs to: Go to the earliest archive of the thread on which you found the link, which will usually be on the archive.is site. If you're lucky, the link was still live when the thread was archived. The other thing to do is search earlier posts that you already have to see if someone else linked the same post.

07564d No.670332


Time out isn't the problem in this case. Since I'm working with small batches at a time, they're quite quick.

07564d No.670388


I have the vast majority of both. Go check it out.

http:// q-questions.info/research-tool.php

After I resolve the table size problem (which is what I think the real problem is), I think it would be good to work some more on my contexting program. On my local computer, I've got it so that it can look back through the links and show all available context with the post. What I haven't done yet is copy that contexting information to a Q post's context when I find one in the backward linking. It'll be ridiculously easy once I set about doing it. Then, when a Q post is pulled up, all that stuff that linked back to it can show together with it.

70e498 No.670421


Hmm. Yeah just doing some easy math I can see how you would have more than 1mm records. We're at bread 815+ something here and with 751 post each that over 600k here on 8ch alone.

You may be onto something with that. Is there a limit? https:// stackoverflow.com/questions/2716232/maximum-number-of-records-in-a-mysql-database-table

Looks like number of rows my be determined by the size of your rows.

70e498 No.670470



So much data. It's mind boggling.

07564d No.670518


Yes, there may be a 1GB limit on the file size, and I'm right about there now. If I partition, I can get around that.

98bd4e No.670526

Below is the qtmerge modified raw dataset (text-only) as of 2018-03-14 02:07 UTC.

I'm putting this out in the hopes that it may be useful to others for ETL, mining, search tools, archiving etc.

Some notes:

* The data is a synthesis of the the qtmerge datasets: https:// anonsw.github.io/qtmerge/datasets.html

* For an idea of threads that are available see: https:// anonsw.github.io/qtmerge/catalog.html

* eventcache.json file contains the posts/tweets/etc in chronological order. The type attribute currently dictates the local object structure (working to fix this to be more clean)

* refcache.json contains the detected post cross references (this is a work in progress)

* The referenceID attribute is the "primary key" between the files

* Timestamps are Unix Time and time strings are US Eastern

Extracted size: ~850 MiB

SHA-256 sum: d6ed89da05c0b714fc66b04ca66a8d701456d882d5f128ee1cef26c8d2e22eb6

http:// anonfile.com/dazfO8d4ba/qtmerge-text-2018-03-15_05.18.37.tar.bz2

07564d No.670571


That's just the general threads. When I started linking through the breads, I found that I needed many of the other threads, too. Most of those are smaller, though.

5f9a22 No.670617


> You want to be able to search across ALL 8ch? Not just Q Research? By platforms are you talking 4ch/8ch?

Not all 8ch. Just 4 and 8ch Q related threads. Q has posted in but a small part of all of the digging (and bullshit) threads and much info is contained in those threads. /pol/ was a cluster until adopting the /cbts/ threads, but they shouldn't be too hard to round up and include in the searchable database.

In fact, I'd only include the qresearch general threads since the GA/qresearch reset. Add the digging/ancillary threads as possible. Most of the gold is in the general's IMO.

07564d No.670657


The reason I'm pulling in other threads is because they get cited as notable posts. I'm not bothering with them unless that happens.

d6b0f8 No.672305

File: bba5385da9ac1ce⋯.png (76.13 KB, 1464x582, 244:97, ClipboardImage.png)

I can get the other boards and other threads, the issue is disk storage. Linode gives me a lot of bandwidth, but only a few gigs of disk until I change my plan with them.

d6b0f8 No.672334

The limit of an OpenQM hash file (table) is 16TB. When this becomes a problem, I can create a distributed file (table) by primary key. Say, put all 8QR in 1 portion, 8GW in another. Simply a way to have physical storage allocated

Pavuk session records are GUIDS. (don't worry, I'll purge anons out of the storage.) It was done because of commercial requirements for SOX and other audit compliance issues. Remember, I created Pavuk to build commercial apps.

The distributed file is built by using the first 2 bytes of the GUID from the primary key. Thus, it has component files:





Or 256 parts.

Theoretical table size:

256 x 16TB = 4096TB


d6b0f8 No.672344


I'm going to look at your work.

d6b0f8 No.672421


I tried 4chan/cbts/index.html and got a 404 yesterday

d6b0f8 No.672572

File: ba02c76b860d580⋯.png (272.49 KB, 1748x920, 19:10, ClipboardImage.png)

Brother Anons, I can find the IDs of the threads by using the search function on Archive.is. For example, research general #2 was post number 799. Once I know this, I can go back to 8chan and pull up the thread.

Sadly, I cannot get it with JSON. I only can get HTML. This means parsing the HTML.

This means a new string parser, but it goes into the same table as the JSON, but with more work. Here's what the posts look like in HTML

d6b0f8 No.672659

I've put out a tweet thread showing the progress and asking if someone will step up to help lead a crowdfunding campaign so I can afford a bigger Linode.

41bee9 No.672688


>I tried 4chan/cbts/index.html and got a 404 yesterday

I'd expect that. Threads sunset there rather quickly. I think most everything from 4ch is in http:// archive.is/search/?q=%2Fcbts%2F

I got 22,900 hits. Some people used 4plebs and maybe even other archives. Need to know all of the archive sites used so we can add them to the soup.

A search on 4 plebs from 10-28-2017 to the night of the bans, 11-26-2017 shows 714 hits.

https:// archive.4plebs.org/pol/search/subject/cbts/start/2017-10-28/end/2017-11-26/

41bee9 No.672709



I don't know diddly about crowdfunding, but I will certainly contribute. Are things like that generally paypal friendly?

41bee9 No.672743


Belay that last link. It searches only to midnight of the day specified. This one goes through the 26th.

https:// archive.4plebs.org/pol/search/subject/cbts/start/2017-10-28/end/2017-11-27/

41bee9 No.672834

File: 4b34a9ff28fb5df⋯.png (8.6 KB, 1061x83, 1061:83, cbts1.PNG)

File: 640134501c1855e⋯.png (3.63 KB, 567x80, 567:80, cbts2.PNG)

Here's some interesting trivia that I missed after being banned. I did see Q approving the first migration in real time, but missed this. Interesting.

d6b0f8 No.672886

File: 99bd8abf0435c23⋯.png (173.73 KB, 957x1032, 319:344, ClipboardImage.png)


I was just going to have folks send to my personal paypal account since I'm funding the site anyway. You can set up a regular monthly payment. I do that with others like Stefan Molyneux where we send $10/month.

d6b0f8 No.672897

File: b8602cfeff149e8⋯.png (111.48 KB, 1025x804, 1025:804, ClipboardImage.png)

d6b0f8 No.672908

We need to work together to get all of the data into the database. If someone could help with a Twatter feed from DJT - preferably raw and in JSON, that can be added to the posts table.

d6b0f8 No.672920


That was helpful. I would ask people in this thread to help develop the information model.

There is a "boards" table with the links to get data for each type. It can be expanded into which boards are archived where and I can automate the pulls.

41bee9 No.672980


>personal paypal accoun

Set up an account specifically for this, don't dox youself. (((They))) will be able to find you, but the malicious shills won't.

d6b0f8 No.672997


already doxxed. I own pavuk.

41bee9 No.673070


Ha, OK. Thought that might be the case.

41bee9 No.673658

Here's another archive with over 1,200 threads:

https:// archive.fo/search/?q=%2Fpol%2F+-+cbts

Some good ones here missing in other archives. How many more are out there?




41bee9 No.673773

Found first CBTS thread on 8ch.

http:// archive.is/Pvbqq

023ac5 No.673832


Yes they will and I will add that my paypal account was subject to MUCH fuckery during the time I was posting a lot about PG on my twitter. Nov/dec 2016

fe09dd No.674321


I can probably get you what you need. What are you looking for specifically? All DJT tweets? Tweets with Delta's?

adacee No.674502


That sucks. I love the system, though! More user friendly than my crap attempts.

adacee No.674536


They sell "Storage Blocks" expansion way cheaper than more memory. Very fast systems already. Lots of data on the 8GB plan, buy another 100GB storage for way less than the next plan. Call linode to get info on that.

41bee9 No.677084





Another one"

https:// yuki.la/

70e498 No.680546


>http:// www.trumptwitterarchive.com/data/realdonaldtrump/2017.json, 2018.json

07564d No.690213


It probably won't be long until I find out if my host really means it when they say "unlimited".

07564d No.690263


Limits depend on the operating system. I'm not sure how much I'll end up needing in the end. I've got some full page web captures in my system that may bump up the size needed fairly fast. So far, I haven't outgrown the 500GB on my home system. It's about half full now. But that also includes just about all of my software. I have other drives, so I'm not limited to that 500GB. (Recalling when a 60MB hard drive was a big deal…)

07564d No.690320


Yeah, that would be cool to add to my system, too. I wonder where I should fit that into the task list. I've got to reparse anyway, so it has to be after that. (Backslashes weren't properly handled the first time around.) It was my plan to get to it eventually. So much to do! If you've got it in JSON files, I've got to believe it would be very easy to get them into my system.

07564d No.690349


>https:// yuki.la/

The archive sites are only as good as whether they're actually saving our stuff. What's the hit rate finding stuff there?

I'm not sure, but I think archive.is and archive.fo may be the same system. Mirrors, perhaps?

07564d No.690481


I don't have 4chan/cbts. Was Q posting there, too? If I recall correctly, we went from 4chan/pol to 8ch/cbts.

70e498 No.698628


AllQPosts smashed with DJTwitterposts by day

https:// github.com/QCodeFagNet/SFW.ChanScraper/tree/master/JSON/smash

07564d No.704953

I got the problem with the backslashes fixed. Also, I changed the way I process emoji characters. There actually might be a few more posts that get parsed in during the reparsing. I am in the process of reprocessing everything now. This is going to take a while. I'll let you know when the uploads are done, which will probably be tomorrow afternoon.

b08c93 No.705344

speaking of searchability, here is a search engine anons can use that will let you search for all those things normal search engines won't, like stringers that include punctuation / symbols or exact spellings of short words and abreviations, without the search engine being 'helpful' and excluding the results you want, and returnign the results it thinks you want.

b08c93 No.705347

http:// symbolhound.com/

59e915 No.709043


> I think archive.is and archive.fo may be the same system

Yes, they sure look like the same system as does archive.li. I must admit complete ignorance how they are structured and how they work. I initially thought archive.is was for /pol/, but now I've found /pol/ and cbts all over the place. Any anon's have any insight it would sure be appreciated.

59e915 No.709094


>we went from 4chan/pol to 8ch/cbts.

4chan/pol/ first posts were 10/28/2017. We were flushed by a bot storm on 11/26/2017 and regrouped on 8chan as CBTS. When that blew up the campaign became The Storm. When that blew up is when we landed on our own board qresearch/greatawakening.

Archives and threads are all over the place, one of our fundamental challenges aggregating all the info to be searchable.

07564d No.722324

All records and images that I have should now be up on the research tool.

I thought my post count was short on the site last night, but using the following statement on both, they are equal:

SELECT COUNT(`post_key`) FROM `chan_posts`

Funny thing is that when I pull up the table in phpMyAdmin, the row count does not equal the answer to that query. It's short on both. Don't trust the row count in phpMyAdmin when you view a table.

Total number of posts in the research tool is:


Next up: Getting the POTUS tweets into the database.

http:// q-questions.info/research-tool.php

43423a No.724053


>Has anyone thought to take full news articles and social data dumps, per person, and do sub text matching across the entire body of text to find exact matches?

07564d No.724127


I've thought to do it. The tagging feature can get us there. The problem is that tagging posts is a lot of work. I need to find a way to get others to help with that without compromising the database.

70e498 No.733102

OK brother codefags. I've stood up a simple API. It serves json and XML for your consumption pleasure.

It's currently set up to:

1) Scrape the chan automagically and keep an archive of QResearch breads and GreatAwakening.

2) Filter each bread to search for Q posts and include anything in GreatAwakening into a single QPosts list

3) Serve up access to posts/bread by list, by id, and by date.

I'm going to incorporate the TwitterSmash delta output next. I figure I can do a simple search across all Q posts easily. Searching across the breads is harder.

You can check it out here: http:// qanon.news/

McAffee says secure https:// www.mcafeesecure.com/verify?host=qanon.news

There's a sample single page app that shows how to use it. http:// qanon.news/posts.html

I still gotta set up my email account so if you spam me now, it's likely to get bounced. I'll check back in later.

My reason for doing this is twofold, I figured we could use it, and I'm looking at the job market in my area and thinking about changing it up. This is partially a learning project to open opportunities by using different tech. I'm claiming ignorance. My plan is to try out an elasticsearch node once I get this working as designed.

Let me know if you can think of a query/filter that you think would be useful. It's not proven to be too difficult to work new things in other than the ugly local path issue I came across working on it this morning.

Try it out anons.

43423a No.734330


I think you're misunderstanding my idea. The idea is to identify sources of narrative scripts being pumped into the public conciousness. Remember when Trump's speech at the '16 RNC was immediately phrased as "dark" in dozens of articles, tweets, etc? We need to know who's putting out the scripts ("dark") and who's repeating the scripts ("""journalists""" that articles with "dark" are attributed to, shitter users with "dark" in their tweets, etc)

The code could work in different ways but trying to automate everything at the beginning is hard. The easiest way to start would be:

>anon notices a suspicious pattern of the same language being used all of a sudden

<like "dark"

>anon enters the string that's being repeated into a text box

<bonus points if it's pure JS that can run locally rather than requiring a server, at least initially

>code ingests search results of news, shitter, faceblack, etc with that string from the recent past

<configurable in near term increments like past hour, past day, past 2 days

>anon is provided a list of results

From this simple aggregated news & social search an anon can easily see by visually skimming the results to see how widespread the suspicious pattern of the same language being used all of a sudden is.

<next features

>let anons select search result items as suspect and enter them into a database that indexes on journalist/author, keyword, etc

>database can use search result item post date to build a timeline, to identify the earliest sources of the narrative script

At this point, with the database trained on common sources of narrative script repeating, it would be pretty doable to automate suspicious pattern detection by ingesting the full body of content from the sources and searching for sub text matches that exceed noise. Like if "the" is used in most of the article headlines and tweets, that doesn't mean shit because "the" is a common word, but if "dark", an much less common word, all of a sudden appears across article headlines and facebook posts, that would be pretty easy to pick up for human review.

07564d No.737563


>We need to know who's putting out the scripts ("dark") and who's repeating the scripts ("""journalists""" that articles with "dark" are attributed to, shitter users with "dark" in their tweets, etc)

You can search the word "dark" in my database as it is right now. If that word was used in chan discussions (and it was), you can get results for it. Is there something you think we need to add? Do you have an idea for an algorithm based on what we have?

Right now, though, I changed my mind about what to do next. I want to get the contexting code finished. When I've used my personal version of it, I learned quite a lot.

After that, I will work on getting the tweets in there. If anyone can point me to php code for that, it would be appreciated. I'm not talking about chan posts that link them, but rather the tweets themselves.

07564d No.737661


I've got a suggestion for the search: enter the following in the text field:


and also in a separate search


Those should find posts that use the word "dark" and include a link. I don't know how to do this better with what I have without doing some extensive programming.

70e498 No.739099


> I've been getting my twitter data from http:// www.trumptwitterarchive.com/data/realdonaldtrump/2017.json, 2018.json

07564d No.742213


He isn't keeping it up to date.

70e498 No.744374



There was a 9 day gap at the beginning of the year. Otherwise it's been updated. Unfortunately I think there were 2 markers in that time. Delta anon knows about it.

07564d No.746289


I didn't see anything past January.

70e498 No.746837


Refresh yer cache? I'm seeing Jan 9 - March 21 2018

07564d No.750792



Reverse order. OK, I see it. Thank you.

70e498 No.760314

Feckin dates. I got it all sorted out. Discovered a bug in the different times zones my dev server is on and the API webserver.

I've been sorting out small bugs and about to wire in the TwitterSmash. The automation part seems to be working good now that I sorted the date bug. I've got it set up to do hourly scrapes. Last run at 8:03pm 3-21 est. The scrapes themselves only take about 45 seconds - including the twittersmashing. There's a test smashpost page here to see the deltas in action. Not totally live Q post data online yet.

http:// qanon.news/smashposts.html

This is another test page using live data

http:// qanon.news/posts.html

I did this to test some code out. Get a random Q post.

http:// qanon.news/api/posts/random/?xml=true

I set up an elasticsearch node today to experiment. We'll see how that goes. Could be an huge pain in the ass to set up at a host. We'll see.

bef1f1 No.761567



I'm trying to help but you're not getting it. Reread my posts.

07564d No.763341


I think that's beyond the scope of what I'm doing. Hopefully, there will be enough here that what I have can help you do that research, especially after I finish the contexting work. Right now, I've had to reparse the database yet again to correct image links. I hope I've finally gotten it right because it takes an entire day to cycle through the entire set.

70e498 No.771168

Update your tripcodes codefags.

public readonly string[] ConfirmedTrips = new string[] { "!ITPb.qbhqo", "!UW.yye1fxo", "!xowAT4Z3VQ" };

http:// qanon.news/api/posts/943/?xml=true

07564d No.773397



Thank you for the heads up. I've made the change in my code, too.

The export/import finally looks like it's ok. Please let me know if you run into issues.

I'm going to be pulling out the post range and thread range options from the form. They unnecessarily complicate things now that I've added date range capability.

I'm moving on to contexting now. Y'all are going to love that feature.

70e498 No.774681


yeah that sounds like a good one.

I've done some more work on the http:// qanon.news api. I managed to work out a coupla small bugs and get the TwitterSmashed posts integrated. Everything seems to be working as designed.

Here's the smashposts.html demo page. Shows deltas to Q posts within the hour.

http:// qanon.news/smashposts.html

I've going to add another result to the smashposts where everything is grouped by days. I'll probably put it in the posts API as well.

It's starting to look like this may be close to going on autopilot. Any interest in changes/additions before I move onto something else?

70e498 No.774698


I'd love to work out a local copy of the Jan 1 2018 - Jan 9 2018 @realDonaldTrump tweets. Those are missing from the trumptwitterarchive site. Anybody got access to that?

07564d No.775587



It looks good so far. One thing, though: you need to save the images. You're linking directly to the 8ch images, and those have a tendency to go missing.

70e498 No.781191


Hmm. Yeah I'll look into it. I can see that archive getting really big really fast. This things only been running for a month and it's over 400mb only JSON. I'll have to make sure what kind of space I've got avail.

07564d No.782643


But you're not saving more than the Q posts, right? There aren't that many Q posts, and he hasn't posted that many images. But if you're trying to save the entire thing, yes, it's really big and grows really fast. I'm not automatically saving the full size images, and there's still quite a lot in my set.

70e498 No.791554


I never figured that another image archive was what we needed. Each of the QCodefag installs has it's own local archive. My concern was in preserving the JSON data from QResearch before it slid off the main catalog.

I'm going to put up a more simple list to show what's been archived. I'm showing 716 total breads., but again that's only starting at 2-7-2018. Q Research General #358 is my earliest full archive - it's up to #982 now.

That's 624 breads in 47 days. 13.2 breads per day. EST 4846 breads in one year ~ 800k/bread = @ 4GB/year in JSON bread alone. Mebbe different if I moved to a DB.

I may have enough storage, but it's so hard to say. Any image archive estimates anons?

d6b0f8 No.791772


I just saw this info. I need to convert my monthly plan to an hourly plan before they'll let me buy storage blocks.

d6b0f8 No.791778

Pavuk Searchable.

663ab1 No.798718

Can someone post the original json of GA post 461 which was deleted? I pulled the json data from qanon.pub, and can use pieces of it to fill in my local copy, but I'd rather have the real thing if I can get it.

As an example, below is a comparison of the original 460 from 8ch and the archived version from qanon.pub. They are close, but the 'com' field did go through a filter to get into qanon's 'text' field. Not saying there's anything wrong with it, but I have the originals for all except 461. Am playing with python code to save all the json files locally for all relevant boards on 8ch, and can parse & search for keywords or q's trips, etc. and display in a browser. Since it's all stored locally, a search doesn't have to hit the net. It's not perfect by any means, but if I can clean it up a bit, I'll share if there's interest.

8ch original 460:


"com": "<p class=\"body-line ltr \">Updated Tripcode.</p><p class=\"body-line ltr \">Q</p>",

"name": "Q ",

"locked": 0,

"sticky": 0,

"time": 1521824977,

"cyclical": "0",

"bumplocked": "0",

"last_modified": 1521824977,

"no": 460,

"resto": 452,

"trip": "!xowAT4Z3VQ"


qanon.pub copy of 460:


"email": null,

"id": "460",

"images": [],

"link": "https:// 8ch.net/greatawakening/res/452.html#460",

"name": "Q",

"source": "8chan_greatawakening",

"subject": null,

"text": "Updated Tripcode.\nQ\n",

"threadId": "452",

"timestamp": 1521824977,

"trip": "!xowAT4Z3VQ",

"userId": null


Need 8ch original 461 please if someone has it.

70e498 No.799873


Try this

http:// qanon.news/api/posts/962/

or this

http:// qanon.news/api/bread/452/?xml=true

add/remove the xml from the query string to get XML

663ab1 No.803461


>http:// qanon.news/api/posts/962/

Perfect - thanks! The xml flag showed me the exact pieces I was missing to rebuild my entry. Much appreciated and quite a handy api…

964d61 No.803653


Forgive me, lads. Where do i go for info on Valerie Jarrett? Got lost.

d6b0f8 No.803777

Linode is telling me that I can get block storage, but only by migrating my VM to the Fremont data center, getting a new IP address (SSL cert. etc.)

Crickets from followers whom I've asked to donate funds for the added expenses.

07564d No.803966


The search engine on the Research Tool works well. Try searching VJ, too.

http:// q-questions.info/research-tool.php

70e498 No.805300


Do you have to have block data storage? Any other options?

70e498 No.805321


Glad it was useful. The posts API numbering is a bit squirrelly till you get used to it. The post ID is the post count starting from 1 on Nov 28 2017.

So finding out it was post #692 I had to view all posts (or posts.html or and of the QCodeFag installs) to get the post#. The bread# is in the post as threadId

d6b0f8 No.809001


What *other* options?!?!



If I don't have enough storage, where am I going to store the data?

If you don't know about IT, you should not be in this thread.

70e498 No.809048


Fuck off nigger. I'm just trying to come up with other ideas. I've been in IT for over the last 2 decades. I know exactly whats going on.

My point was, hosting can be found on the cheap if you look around. Not sure you NEED SSD. What you need is storage space. I was thinking drop the SSD for cheaper storage.

Whatever, it's your problem. You seem to be capable of figuring it out.

d6b0f8 No.809084


I'm sure you're really good at building PCs for your aunt Martha. Plugging in the cards and loading and reloading Windows.

70e498 No.809146


Hurts me to my core!

No I write the software. Whatever. Deal with your own problem - it doesn't concern me.

9a029c No.809411





Can't find contact info on your site(s). Link?

07564d No.810060

I decided to prune. Too much garbage is in the chans.

d6b0f8 No.810605

YouTube embed. Click thumbnail to play.

Raw video.

d6b0f8 No.811118

YouTube embed. Click thumbnail to play.


07564d No.838965

File: 997560d2ea60e2f⋯.png (836.81 KB, 1228x926, 614:463, resting.png)

The research tool is undergoing extensive overhaul at the moment.

70e498 No.843897

I think I finally managed to squash the date bug in the QPosts/DJTweets.

I took the 60min delta restriction off - and it's applying each day's tweets on each Q post to allow you to see all the deltas.

http:// qanon.news/smashposts.html

07564d No.864973

File: 13780b00acc2ffc⋯.png (273.88 KB, 1229x896, 1229:896, GettingLucky.png)

Sometimes I get lucky.

07564d No.879844


The Research Tool is back up with a more concise data set. Much will be added in the next several days as I return to development of the contexting feature.

http:// q-questions.info/research-tool.php

70e498 No.887653

File: 54e4b21d5b8fcc8⋯.png (182.93 KB, 1197x986, 1197:986, Untitled-1.png)

File: 495288dfb71a59a⋯.png (58.1 KB, 1197x986, 1197:986, Untitled-2.png)

I've been thinking about a timeline for the past few days. I looked into different solutions and found timelineJS that works pretty good.

I managed to wrangle the API data into a timeline. I'm planning on adding in the DJTwitter data and ideally news/notable events.

Once I can get the twitter data in I'll cut it loose. I was hoping to figure out an easy way to get other data into the timeline. News/notables. Any ideas? QTMergefag? You got good news/events?

Here's what it looks like:

07564d No.890080


If I can figure out how to import the twitter posts WITH the images, getting a timeline in Research Tool system is a no brainer. The JSON someone directed me to does not appear to have the image links, unfortunately. The images are essential to some of the tweets.

The plan is for POTUS to have his own post type. Then all one need do is select both q-post and potus posts in the same search, and they'll be displayed properly interleaved.

70e498 No.891187


I think the timelineJS handles that for you if you add it as media/tweet to each slide.

07564d No.891871


OK. I guess I'll have to take another look at it. Right now, though, my priority is to get the contexting feature working. I do wish there was a way to safely hand off some of the work on the site I'm putting together. There's so much to do! But I have no idea how to know to trust someone. Clowns will be clowns.

70e498 No.892076

File: 47a15bed31ad982⋯.png (455.88 KB, 1197x936, 133:104, ClipboardImage.png)


Agree. I've been thinking about trying to work out a way of collab. I'm sure I could come up with a way to prove we're who we each say we are. Unless the clowns are here building community Q research tools…

Check it out. I got the twitter working.

What I can say about this timeline is that there's alot of events on it. There's Q posts batched down to days across 98 days. Add in the Tweets and there's alot going on. Each day/tweet == a slide. It's definitely more than it was probably designed to handle. It takes a minute to make sense of the somewhat sizable JSON data and then render the display.

70e498 No.892089


FOK delete this please

07564d No.892772


>It takes a minute to make sense of the somewhat sizable JSON data and then render the display.

I just have to make sense of a few of them. Then I can come up with an algorithm to parse them into the structures I already have developed. My site is quite capable of handling multiple sources (chan, tweet, other posts) if I can do that much.

70e498 No.892975


"scale": "human",
"start_date":{"year":"2017","month":"10","day":"28","hour":"0","minute":"0","second":"0","millisecond":"0","display_date":"2017-10-28 00:00:00Z"},
"end_date":{"year":"2017","month":"10","day":"28","hour":"0","minute":"0","second":"0","millisecond":"0","display_date":"2017-10-28 00:00:00Z"},
"headline":"HRC extradition...",
"text":"The body text...<hr/>"
"media":null,"group":"QAnon Posts", "display_date":"Saturday, October 28, 2017","background":null,"autolink":true,"unique_id":"1dba35d4-46ac-4c5f-94d7-1e6b0f53ad4d"
"start_date":{"year":"2017","month":"10","day":"28","hour":"21","minute":"9","second":"0","millisecond":"0","display_date":"2017-10-28 21:09:00Z"},
"end_date":{"year":"2017","month":"10","day":"28","hour":"21","minute":"9","second":"0","millisecond":"0", "display_date":"2017-10-28 21:09:00Z"},
"text":{"headline":"&Delta; 25","text":"2017-10-28 21:09:00Z<br/>@realDonaldTrump<br/>After strict consultation with General Kelly..."},
"media": {"url":"https:// twitter.com/realDonaldTrump/status/924382514613030912","caption":null,"credit":null,"thumbnail":null,"alt":null,"title":null,"link":null,"link_target":"_new"},

07564d No.893062


What is this from?

07564d No.893190

I decided to see if I could find some hidden Q:

SELECT * FROM `chan_posts` WHERE `post_type` != "q-post" AND `author_hash` IN (SELECT `author_hash` FROM `chan_posts` WHERE `post_type` = "q-post")

This statement found 718 of them I hadn't identified.

70e498 No.893234


That's the output from a new timeline api I'm working on. It plugs directly into the timeline.JS.


Holy shit. That's notable there innit? Are you the OP in this thread?

07564d No.893321


Figured out quickly that I had to add a couple additional checks.

SELECT * FROM `chan_posts` WHERE `post_type` != "q-post" AND `author_hash` IS NOT NULL AND LENGTH(`author_hash`) > 0 AND `author_hash` IN (SELECT `author_hash` FROM `chan_posts` WHERE `post_type` = "q-post")

Still came up with 120. Perhaps a couple of them were misidentified as Q in the first place?

70e498 No.893483


Interdasting. I'd have to see a list.

http:// qanon.news/timeline.html

http:// qanon.news/Help/Api/GET-api-timeline

07564d No.893908


At least one of the ones I had identified as Q, maybe 2, had been mislabeled. Plus, a known impostor got tagged as Q. Not sure how that happened. I'll have to fix it. But a few other interesting ones popped up.

I made one of my editor features available to you so that you can have a look. On the search form, go to the bottom and check "In processing list:" box. Leave the rest blank. And you can have a look for yourself.

http:// q-questions.info/research-tool.php

70e498 No.894169



Yeah it looks like there are some missed posts in there for sure. You may have done some good work on that one.

07564d No.894303


ID:RrydKbi3 in post 147683274 definitely looks misidentified to me.

07564d No.894338


I have to go to an appointment now. But I'll fix the known misses this afternoon, and I can tag you to have another look, if you like.

70e498 No.894345



Agree. That's the only post with that ID. Nothing ties it back to Q.

Same for Anonymous ID:9o5YWnk7 2017-10-29 19:35:45 Thread.147146601 Post.147171101

70e498 No.894359

07564d No.898657


There are more of them than you're seeing, actually. I've just discovered that I'm still having issues with the import/export process. Not everything I've set to export is getting up there. I'll have to run that to ground tonight and fix it. I thought I had that worked out already. When I was still thread-based, everything I was exporting from the home machine was importing just fine into the online machine. But I guess I changed the logic somehow when I went from thread-based to post-based. (It can sometimes actually be more difficult to change a program than to write it for the first time.) At the moment, some of what I've said below may not be visible. But sometime tonight, it should all be there.


He responded to Q. That's it.


Yes, he was just responding to a Q post. He isn't Q. I'm not sure at this point if it's an approved post or just another response. I'll have to take another look at it when I'm working with the maps again. For now, I've demoted him to a regular anon. And I'm removing the posts that weren't marked as Q from the online database, at least for now.

I'm not sure what to think about ID:afa548. I had the impression that a hash was good for only one thread. And yet he shows up as a hidden Q in one thread and with his trip in another. Same with ID:4533cb, but there was only one unmarked post for that one.

ID:5ace4f has only one marked post. It looks like he got marked as Q because he's on a map, but I'm not sure it's really him. The other posts look interesting and possibly relevant, though. Still, it's possible the one should be marked as approved rather than as a Q post.

ID:071c71 got reused on a different board. On one, with a non-Q trip. But it's interesting who that ended up being.

ID:23de7f looks entirely legit and probably could be marked.

With ID:d5784a, you can see what I can do to imposter clowns.

ID:1beb61 and ID:26682f look like imposters, but I haven't heard one way or the other on those. Maybe I need to put date ranges on my trip test?

Some hashes are particularly colorful in their unmarked posts. Not sure what the story is there. But I do believe the one that's marked is legit. Maybe another should be marked, but I certainly wouldn't mark all of them.

07564d No.900583


They're all up there now. There was something weird about two of the records. In one case, someone did something to a file name that I didn't know could even be done! I'll just have to edit that in the database, and it should be fine if it ever needs to be exported again. And I don't know what the deal is with the other. I pasted the SQL statement for it directly, and it worked just fine. Slash issues, maybe.

07564d No.904541



I've been looking further at this. I don't think the one in cbts is Q. The hash just happens to be the same. But there's something like a 3 month gap in when the hash was used.


Fairly certain he's fake, and I'm marking him as so.

A couple of the ones I'd incorrectly marked as Q had the same post number as an actual Q post on another board. So I suppose it's easy to see how that could have happened. Now that Q uses a trip, that's much less likely to happen. They're probably relics of a time when I hadn't developed my toolset so well yet. Now, it's easier because the editor mode of the research tool has drop boxes and the like for making those kind of changes. When I had to use phpAdmin, I was somewhat flying blind because I couldn't see as well what was really in a post. Now I can see the posts in their final form when I'm making changes like that.

1bfbc2 No.904574


Not constructive newgro. You would do well to realize the calibre of techs that browse chans and do what you can to get their help rather than get salty.

07564d No.907276


By the way, this has not been an idle exercise. One of the things I'll be doing is keeping track (programmatically, in the data) of context chains that reach back to Q. So it's important that Q be properly identified. To that end, finding hidden Q has been valuable. Not only did I find Q gems I had not recognized (probably because they're on maps I haven't worked through yet), but I was able to recognize some misidentified posts as well as get the imposters properly marked. So it's all good.

70e498 No.958418

Qanon.news bumped from the bread anons.

Somebody said that the site was serving malware and it was taken out of the bread. I posted in the meta thread to have BV check it out and he gave it the OK. I spent an hr or so trying to get it back in. No luck.

I'm not interested in begging - but I do want people to use what I've been working on. I'll see what happens after dinner I guess.

70e498 No.965953


Meh. I've been thinking about it. After reading all about codefags problems, bandwidth issues, SSL certs, all the other qcodeClones… It may be better to just stay quiet and let people use it when needed. I'm a little disappointed that it was so easy to get something removed from the bread.

What I've been working on is really more backend style anyways. I have been thinking about a few different things though.

I saw one anon post something about there needing to be an RSS feed for QPosts. I think that should be pretty easy to provide. If I get some time I may whoop something out.

I've been playing around with the timelineJS. I worked it up where you can select a specific timeline. Qposts. DJTweets. Etc. Q has mentioned timelines a few times and I've been looking around trying to find threads that were timeline based. No real luck so far. Anyways, I was thinking about working on some different timelines.

I've been starting to wonder if moving to a database solution rather than file based json is going to be worthwhile. Better speed probably? Built in caching? Do I want that for an api? What does everybody else think?

70e498 No.966304


Even in here.

07564d No.967035


We must be over the target.

70e498 No.981495

I built a new API to get a specific post from a specific bread. Maybe I'll get it uploaded today.

Looks like ~/api/bread/981411/981444/

to get >>981444

Researching an RSS/ATOM feed. That looks to be low hanging fruit.

f86e40 No.983853

Very afraid they are!

Goodbye trolls and shills!

70e498 No.984329

I was contacted by a guy that says he's from this site http:// we-go-all.com

Looks to have a Qcodefag repo installed on a page. He wanted to know if he could help at all and I asked him if he had posted anything in here.

He doesn't know anything of the codefags thread. He's interested in access to the api. I don't wanna dox the guy, but this name matches a guy that works for Representative Jared Polis (D-CO 2nd)

5th-term Democrat from Colorado.

http:// www.congress.org/congressorg/mlm/congressorg/bio/staff/?id=61715

Probably nothing. The QCodeFag stuff is open, 8ch is open. Nothing to worry about anons?

8e73cf No.984766


70e498 No.988865


All updated

New Qanon ATOM feed:

I managed to throw together an ATOM feed here:

http:// qanon.news/feed


http:// qanon.news/feed?rss=true

It returns the last 50 of q posts. It's a work in progress. I can include referred posts, images etc.

New Timeline api: Timeline api that shows Qposts and DJTweets. I also set up an Obama timeline that another anon pointed out. I'm planning on adding more to it and some other timelines I'm thinking about. You can see a few at http:// qanon.news/timeline.html

07564d No.989046

With the contexting problem I'm working on, I'm thinking I need to also write a "mea cupla" system for when a bread (or bread-like) post is not properly identified. It would go in and recalculate context when status of a post changes. This way, I don't have to be so concerned whether bread posts are properly identified at the outset, and I can just get on with it.

f47016 No.989332

Hey CodeStuds - I was wondering if there's a quick way to find all posts in the qresearch thread by 'U'? I've run across a couple and I've really enjoyed them. I am not trying to take anything away from 'Q' drops - I owe 'Q' a ton for waking me up. But the 'U' drops always ease my mind and make things clearer for me…not sure if they're benefiting anyone else in the same way or not. I wanted to grab them all if I can find them. Thanks Patriots. #WWG1WGA

07564d No.989845


There could have been before I took everything down and then uploaded only select posts. But to do what you want, I still would have had to set up a whole word search mode, and I didn't have that yet. I abridged my public database due to obnoxious content by shills. I don't want to republish that stuff. I won't put the whole thing back up unless I have a way for visitors to flag posts for review, and right now I don't.

07564d No.990025


If all you want to search are Q posts, you could try using my system. The way it's set up, you can't force it to look at the first or last letter of the post. But you could try doing searches with a space before and after or a period before and after and other such things to force a word search. The REGEX of the LIKE statement is not strong enough for much else than this.

http:// q-questions.info/research-tool.php

f486e4 No.990817

Thank both of you Patriots for your responses. I will do some regexing around. Be safe anons. Love you guys.

70e498 No.991237


Anything is possible.

U is the username? Any other identifying info? Do you know of a post you could point us towards?

07564d No.1024355


Let me clarify something. Is U a name? Is that the whole name? If I've made it public, you can search that on my site already. If not, I can take a peak and possibly make that public for you if it isn't shill stuff.

07564d No.1024641


I found 1 in qresearch and 3 in 4chan. I've added them to my public database for you. I don't see any real revelations in them, though. Enjoy!

http:// q-questions.info/research-tool.php

70e498 No.1028050

I've discovered the machine broke for a few hours on March 27-28 and I'm missing some json. Am I the only one saving off json or does some other codefag have some to send my way?

PageScraper to json?

70e498 No.1028183


Nevermind. The JSON I needed had slid off the catalog but was still avail. Thanks CM!

07564d No.1030259


It probably should be part of my work eventually, but it isn't yet. It's taken some time to get to that contexting feature. I'm finalizing the algorithm now.

A context chain will begin with a post that has been listed in a bread post and go backward through the links. These are either from the top of the thread or later where the next baker is being told what to include.

Links will also be followed backward from Q posts.

Contexts will stop at bread posts and not include them. (The intent is for context chains to stick to one topic as much as possible.)

When a post that includes a map is encountered, the posts from the map will not be included in the context chain, but links from the text of the post will be included. (Same reason as above: Maps include multiple topics.)

I will keep track of context chains that include Q posts. These can be shown with the Q posts. To minimize confusion, I will be displaying the context chains in separate bordered DIVs with a display/hide button. Not sure yet which to make default. Probably the hidden state to minimize clutter. I MIGHT parse the description of the leading post of the chain from the bread post into it. In the hidden state, this would be all that would show.

70e498 No.1034522

File: 4bdff4d0782661a⋯.png (55.51 KB, 355x327, 355:327, ClipboardImage.png)


Interesting that you should post that anon, I've been thinking the same thing. We need a crawler. Sounds like a great idea. A better way of visualizing the context thread would be great. Ya know I've been reading about Google. PageRank. How that was designed in the beginning. Links you come across that have alot of responses can be either good or bad on 8ch.

With the new breadID/postID feature I rolled out you could find anything you were missing for sure.

So you think your initial targets are just the baker posts and the other posts that are deemed notable?

I've been wondering if we could use a hashtag internally for our own benefit. #notable. That kind of thing.

It sounds like an interesting project. If I can help at all let me know.

73cc1f No.1040483


Hmmm…. I wasn't thinking about doing an indented method of arranging things. Should I be?

And if I knew how to pass off some of the work to others, I'd do it. It's a LOT for one person to do. One of the reasons it's been taking so long is because I'm still adding to the database, etc. If I had left the entire database online, perhaps? But the clowns were shitting things up with some truly raunchy stuff, and I didn't want to republish that. Truth is, though, that I've done some preliminary with this already. It shouldn't take me long to finish the coding. But it will take a while to do the following:

-> properly identify the bread and map posts on some 2300 threads (Yes, this matters.)

-> identify the posts listed on the map posts

Even so, I've identified enough bread and maps already that some interesting stuff should begin floating to the surface. That's part of what is taking so long. The code is pointless without at least some of that done.

I'm eager to get to work on this. I lost an entire evening/night due to a power outage.

73cc1f No.1040716


I think what I'm getting at is that it's difficult to share the work without putting the entire database back online again. If I do that, I may have to do the following:

-> Buy dedicated hosting. If I do that, I'll be putting a donation button on the site for sure. So far, this has all been from my own time (a LOT of it) and resources.

-> Including a "report this post" button. Like I said, I don't want to be republishing truly obnoxious unrelated stuff. But it's all on me right now, and I can only do so much by myself. I'd have to let the community help me control that content.

But you know, really, the way I'm doing things now has a good side to it. There's a lot of fluff in the complete database. The way I'm doing things now eliminates a lot of that. You're going to get the dense info rich posts this way.

73cc1f No.1049462

The program can now save data for the contexting. Tomorrow (aka, after I wake up in the morning), I will be working on display.

70e498 No.1051320




I bought hosting from Godaddy. Unlimited bandwidth and 100GB storage. Economy plan on sale was $12/year. I think I even got another domain with that deal for $1/year that I'm not even using.


I hear ya on time. My shit got bumped from the bread because 1 anon got confused about a malware notification. I've got 2 pretty solid months of time in on what I've been doing and got taken out by a single post.

As we reach more and more of the masses, the information is going to appear on more sites that show ads/donations. It's a way of paying for the infrastructure needed to provide the service. I see nothing wrong with it.

73cc1f No.1059305

File: 73e1b2cc4c7a6c8⋯.png (54.42 KB, 1143x775, 1143:775, Research-Tool-1.png)

File: d74c80a0e2d55d9⋯.png (87.41 KB, 1137x769, 1137:769, Research-Tool-2.png)


The Research Tool can now display context the way I described above EXCEPT that I have not built in a show/hide button yet.

Right now, you have available to you SOME context that I calculated during my initial work putting together a contexting feature a couple months ago. I have more up through the date on the first image, but I have to get an export/import process built to get it into the online database. Since I have an export/import system for the posts, it shouldn't take much to make a modified version for the contexts.

My current task list is:

– Build the export/import process for the contexts.

– Get the contexts calculated for the 2300 or so more threads that I currently have. This could take several days.

– Then perhaps I'll look at getting that show/hide button in there. I might do it in the middle of working on getting the contexts calculated if I get bored of that.

– After that, including POTUS tweets is next.

http:// q-questions.info/research-tool.php

70e498 No.1059896


Wow anon. It's coming together. It will be great to see it once finished.

Interesting what you are doing with the links. I think some of my pages are linking like the qcodefag sites. The RSS I hooked up to go back into the api. Think I should change that?

73cc1f No.1060038


That's up to you and how you want to display your data. It might be cool to automate at least the downloading of new threads for what I'm doing. But to get the contexting right, I have to go through what comes in anyway. As mentioned before, not properly identifying bread and maps can overload the context chains.

73cc1f No.1087614

Contexting functionality is complete. The export/import process to make calculated contexts is complete.

I asked Anons on the general thread whether it is more important to calculate the contexts or to include POTUS tweets. The ones who responded want the tweets, so that's next.

51250b No.1087924


I think the messege 'we are being set up' is in response to the SC failing to pass the IMMIGRATION BILL. Also POTUS tweeted CA will not be accepting national guard to border.

https:// www.denverpost.com/2018/04/17/neil-gorsuch-immigration-law-vote/

70e498 No.1088682



Let me know if you want to hit the smash data. I'll set you up.

I rejiggered the links on some of my pages. It was set up like the qcodefag sites where each post contained a link back here. I changed that to a self referencing link instead. I decided to not be the cause of any more traffic back here.

Statistics show that the pages people coming to my site are interested in primarily the presentation pages - not the API. I think what I've decided to do is remove all references to the API - but still provide it. Default to the posts page or something. I got a few ideas.

73cc1f No.1090066


That would be great! A JSON source would speed that process along greatly.

70e498 No.1090922


Look at the SmashPosts

http:// qanon.news/Help/

Tell me what ya want and I'll see what I can work out.

73cc1f No.1091190

I'm looking at the help page, but I don't understand how to actually make the call to your API. It looks like the call I would want to use is

GET api/timeline/{2}

but I don't see how to actually implement it.

73cc1f No.1091273


I think I figured out what I need to do. I just need to add the path to the URL.

73cc1f No.1091428


There are only 32 tweets in the JSON I got with api/timeline/2. There must have been more than that since October. Maybe I need a different call?

4a2958 No.1091802

My search-fu is nonexistent & need help for something current:

Somewhere within the past few weeks, someone posted a manual for Mueller firing protests. Didn't see it as a notable in BoB. Think it might have been pinched from ShariaBlue or the like. Thought it was a pdf, but not sure. Couple of screengrabs posted. In any case, it was a pretty thorough treatise on how to organize the march, chants, dealing with infiltrators (:D) and other stuff.

A couple of posts appeared today where one city (Pittsburgh) police department announced they were preparing for "semi-spontaneous" Mueller firing riots. That means they have that manual (but aren't disclosing it).

If we can find that manual again and post it all over that town's (and other) social media, it will awaken many to the fact that most of these protests are always preplanned.

Anyway, sorry for the hijack, but appreciate any help.

I just can't find it.

73cc1f No.1091988


Do you recall any words that would have been in the post?

4a2958 No.1092170


Someone found the site where it was from in the current bread:

https:// act.moveon.org/event/mueller-firing-rapid-response-events/search/

I could have sworn it was the whole "rapid events response manual" from MoveOn or allied organizations as a standalone doc.

"Mueller" would return too many hits.

Maybe Mueller + fired + protest(s) or something. Maybe add "plans" or "manual"

This is why their Mueller firing riots plan should get out into the public domain before any protests occur:

http:// pittsburgh.cbslocal.com/2018/04/18/robert-mueller-pittsburgh-police-prepare-riots-if-trump-fires/

Normies will realize how scripted all these protest marches are.

On phone so can't grab the whole site.

TY for any help!

73cc1f No.1092387


Try these. I was not able to find any PDF files posted recently about this.






>>725107 (Unfortunately, I was not able to find the fullsize image of this one. Put a request on the general thread as well as Lost & Found if you really want it. Ask them to put it in the Lost & Found thread so you can find it if you look later.)


73cc1f No.1092397


Looks like those got deleted. I'll make them available on the Research Tool for you.

http:// q-questions.info/research-tool.php

Look in a few hours. I have to run to an appointment right now.

70e498 No.1092764


The Smash API will give you more data you want.

You probably don't want the timeline stuff just yet. Unless you want to just stick with the default q/DJT timeline. Just do a get on the timeline API. The timeline API filters out all the tweets to just show the 5,10,15… deltas.

Yeah Gotta add the full path to the URL. If you are hitting it programattically I gotta give you access. Domain you would be calling it from?

7ceb42 No.1092813


I believe you are talking about this website:

https:// act.moveon.org/survey/resistance-recess-host-materials

4a2958 No.1092887


Yes, that's most of the material, but it had been put into a document (pdf or doc, I think) and indexed.

Much easier to forward a doc to which notes can be added than point normies to a site which is hostile-owned. That document (in whatever format) contained all the articles on that page and more. Was well done by somebody.

4a2958 No.1092973


Somebody found it!


This is the basic protest manual all Soros/ SEIU and associated groups use.

Great doc to hand out to redpill people. Leave the redline the Mueller title and add the protest du jour.

Found in this bread:

https:// 8ch.net/qresearch/res/1092389.html#1092719

73cc1f No.1093667


I'm glad you found it. I'm beginning to think that I need to get the entire database back up there again, even if I have to not upload the images. We've had a couple of search requests like this for which I've had the data. In this case, the original posts had been removed, which would explain why he couldn't find it.

4a2958 No.1093804


Since it was in a Scribd doc, not sure it would have been found anyway, unless someone commented on it using key words.

I couldn't even hazard a guess as to what percentage of information here since Day 1 is critical vs.otherwise. Throughout it all, it's painting pretty clear pictures of the players& their proclivities, even if we haven't found a smoking gun yet.

In any case, thanks again for everyone's efforts.

73cc1f No.1093832


One of the posts I found would have led you there.

73cc1f No.1093909


There is an awful lot of absolute garbage posts out there, to be sure. And now that there are over 1.5 million of them, there is no way one person can censure out the stuff that absolutely should not be republished. I don't like the idea of putting all of the unreviewed stuff up there without their images, either, since a lot of the intel is in those images. It's a tough call. Even though I do have a content warning on the research page, I have concerns about the legal side of just blindly posting some of those images. I most definitely couldn't do it without a reporting feature.

73cc1f No.1096204


I was hoping for the complete set of Trump tweets since Q showed up in late October. Do any of your API calls provide that?

70e498 No.1096311


Well you can get all those from the trumptwitterarchive. What I did is group them into days that Q posted, and then only calculated the ones that DJT tweeted after Q posted.

If you check the API you can see the data, or look at http:// qanon.news/smashposts.html to see it more visually.

73cc1f No.1099789


> trumptwitterarchive

Thank you for the suggestion. That was what I was looking for. I've got it done now.


70e498 No.1100463


You are on it!

Pain having to get the 2017 and then the 2018 from TrumpTwitterArchive but… it's the only way.

I guess I could suck all that in and then offer it as an api… just raw twitter data.

I only thing I found with the twitterdata is that there's a 9 day gap in January at the beginning of 2018. I've been fighting off a compulsion to archive those (manually) to make it complete.


css : You can just use the twitter magic.

https:// dev.twitter.com/web/overview

On the smash page I just make links and decorate with the bird and tweet. The timeline does it automagically.

Here's a question for you.

How hard would it be for you to remove all the inline style you have on q-questions.info/research-tool.php ?

Do you know about jqueryui themeroller?

Conjigger your jqueryUI website and then download the custom css like magic.

73cc1f No.1100644

File: 53c8eeb1f110c85⋯.png (85.12 KB, 1207x839, 1207:839, Research-Tool-1.png)

File: 9a50c1627d356f1⋯.png (150.58 KB, 1209x841, 1209:841, Research-Tool-2.png)


I've pulled it into the same database that contains the chan posts. I don't want to make too many exceptions to how I do things. That makes it more difficult to keep track of what is for what.

73cc1f No.1100681


Eventually, the cream of the project will be going into that WP site that's at the front of the URL. That will take care of appearances nicely.

73cc1f No.1100740


And all of the text is back up there now. People won't have to request searches anymore.

73cc1f No.1117987

File: 3846409f0747842⋯.png (839.64 KB, 1232x881, 1232:881, offline_only.png)


>Archive OFFLINE immediately.

>Offline only.


I'm not sure what was meant by the recent Q post. Does that affect our work? And what is the scope of the request?

70e498 No.1119101


Kinda wondering about that myself.

IMO, he was talking specifically about the NP/NK video. Many have archived that offline.

On one hand, I'm archiving online - but that makes it easier for others to archive.

On the other hand - I'm archiving at home too.

The online stuff I'm doing has no bearing on my archives. I put it online so others could use it.

4202ae No.1119154


Hardcopies. Print out things. Copy files to USB/CD/DVD. Place inside of safe or better yet faraday cage. Use means that are hard to destroy and items that are not online and can be erased via virus or EMP. It's not just for you, but for the Country. Think that everyone is an off line version of "the cloud" but with a hard copy.

73cc1f No.1125149


That was the reason I finally ended up putting it online as well. It seemed a shame to keep that functionality to myself. I reworked a few things to make it better in a multiuser environment. It ended up being better for myself as well.

73cc1f No.1134796


I believe USBs are magnetic. CDs and DVDs are your best choices.

73cc1f No.1159341

Don't mind me. I'm just trying to find some missing posts.

>>309741, >>309240, >>209205

adc966 No.1159568


Anyplace we can download your stash?

73cc1f No.1164429

Well, now I feel stupid. I just realized there's an "Expand all images" link in the lower right of the page header. Had I realized this, I would not have lost so many full size images. One save could have been done in thumbnail most, and another in full size mode, and I would have had everything on the page.

73cc1f No.1164952

File: c2a311efc179b6a⋯.png (145.3 KB, 1000x843, 1000:843, top-of-page.png)

The ctrl-S method of saving a page will NOT automatically pick up the full size images when in thumbnail mode. If the page is expanded when the save is done, then you'll get the full size images (but not the thumnails, though this is a minor issue).

So here's my suggestion to get the best archiving:

You can save once or twice, but one of the saves should be in expanded most. If you want the thumbnail mode as well, then that's a separate save.

(All of the official archives so far have been in thumbnail mode.)

70e498 No.1171658

Huh. Anon never showed up to drop his image link on us?

73cc1f No.1176955


Not yet, apparently. It's a lot of files. It's going to take some time to upload them all, possibly a few days. Even my thumbnail image set takes a long time to upload.

90e281 No.1235051


Still nada.

407540 No.1235261


You are so wrong faggot

She only asks that her comments page is respected, that's who she deems as her people. Do some research before you fuck up your own opinions next time

131565 No.1239115


Unfortunately, we're anonymous here. I have no idea how we can even check on something like this.

90e281 No.1256952

Anon asked about the JSON for all Q posts.

The API is still there, I just removed all the links.


90e281 No.1261359

Anon asked for a word count in all Q posts and I did it really quick. Just gonna drop this here.

Here's the results, sorted by occurrences.


4c77f1 No.1293496



U was just at the bottom of their post. Like…


I'll search for the posts you added to your db. Thanks for hunting around for them!

#WWG1WGA #TheGreatAwakening #ItsSpiritual

354bf8 No.1294544

File: 0a76d8e37c88540⋯.gif (22.25 KB, 334x379, 334:379, 54790.gif)

Why Did George Bush Buy Nearly 300,000 acres in Paraguay?

0e52bb No.1300223


I finally found one by grepping around in the json files. I'm searching for more, but here's an example.

https:// 8ch.net/qresearch/res/932740.html#933285

de2dc6 No.1300381



19c17f No.1310801

Sorry for popping in on you re this but the Anon I was speaking with about "time stamps and markers" said to come here. They have info for me to be able to start working on it.

There was a thread dedicated to this but appears to be missing now or I keep missing it.

19c17f No.1310819


BTW, great work Anons!!

I wish I could help but a bit beyond my capabilities.

131565 No.1310983


I'm a little behind in my archives at the moment, but that should be remedied by this evening. (I was busy working on my tools.) My site is a good one for looking at tweets vs. Q posts because I can show them on the same timeline.


19c17f No.1311143


That is fine and thank you. Can you tell me what is Q's marker that I should look for?

131565 No.1312967


Q's trip codes are listed at the top of the general threads on this board. On my site, known Q posts are shown in green.

131565 No.1313047


For some reason, posts on Q's new board aren't saving properly (except the first post on the thread). But I'll have everything else up there shortly.

19c17f No.1313371



Got the - http://q-questions.info/research-tool.php

Got - https://qanon.pub/

and another that has actual screenshots of Q's posts

been doing some research on Q's marker and need to clarify then will start on "wind the clock"

Much appreciate all the help - I need to do a better job at bookmarking important info on decoding Q.

19c17f No.1313627


Found this in QMap PDF thread. Going to try to locate Anon because no sense on duplicating work.

"Anonymous 01/28/18 (Sun) 10:17:16 ID 3c320a No.190706


Thank you for all this hard work. One thing that I think would really help. If the book could include all the Q post with Time Stamps including the early post before trip code. This needs to be searchable by time stamps (EST). The time stamps and dates could be either with each post or in the front with reference to the post. I find that the time stamps are important to first identify Markers. I’m currently have to jump from time Stamp Search to Marker Search and most data bases I use are not complete with latest posts. This would be extreamly helpful. Thank you Anon. Truly a Patriot! One other thing is some links to Q posts are 404 when link is clicked so I can’t find related time stamp."

90e281 No.1313635


You may be looking for the Delta thread.


I think I told you to come here. I did some Delta workk here


That Delta is only considering the difference between a Q post and a DJT tweet. There's is nothing in there to account for DJT corrections of deltas between tweets.

The deltas you see on the smashpost page are spread out across the Q posts - since there is a different delta for each.

IE: Q posts at 12:00p

DJT tweets at 12:10p [10] delta

Qposts at 12:05p <- this would also mean the DJT tweet at 12:10p is also [5] delta.

I did it like that because I wasn't sure of the meaning of all deltas. Is a [29] valid? Only on the 5's? Good luck anon! Let us know what we can do to help.

90e281 No.1313667


I think most everything we've been doing here has all been resolved to either GMT or Zulu time. 8ch JSON comes in GMT/Zulu. The TrumpTwitterArchive comes in GMT/Zulu.

Correct me if I'm wrong codefags.

19c17f No.1313775


So, just to confirm, the Delta's are the marker?

And I will pop over there and see what is going on. Thank you.

The work being done via this thread is very important. Thank you Anons!!

131565 No.1313946


Yes, my posts are saved in GMT also.

I've about got the issue with the new Q board taken care of. I just needed to tell my database about it. I'm getting those posts ready to upload now.

As it happens, I'm currently working on setting up special search types that you may find useful. One of those search types will show just the Q posts and POTUS tweets. That way, you won't have to think about the proper way to limit your searches if that is what you are after. Look for that in the next day or two. I'm still working on finalizing that feature.

90e281 No.1314163


The deltas are what helps you find the marker.

IE: Q posts something about "win" 5 mins later DJT posts something about "Goodwin" That's a marker. (Just an example - I don't remember the deltas on the goodwin marker.)

The Delta thread where the work has been done on deltas. I'd like to see def documentation of confirmed markers.

131565 No.1319771


It would not be difficult at all to include calculations in my displays. So let me double-check what the logic should be.

When displaying a Q post

– show delta since last Trump tweet.

When displaying a Trump tweet

– show delta since last Trump tweet

– show delta since last Q post

Is there anything else?

131565 No.1321452


I added delta calculations. Check it out and let me know if it's what you need.


90e281 No.1326496



Looking good!

Checking the Show Delta box seemed to kill off any results for me tho. I'll try again later!


I believe you are nearly correct.

Once you have found a [marker], then the time between DJT tweets/Corrections appears to be the indicator of another marker. I don't think it goes back to a Q post delta.

Check the logic for the [5] & [1] markers.

I disregarded all negative deltas (any tweet BEFORE a Q drop). There's information there possibly - but it just introduced too much noise into the results.

131565 No.1329318


I didn't even attempt to find the series. I'm simply showing the delta between the last of either. I suppose I could. So what is the pattern we are looking for?

131565 No.1329416


Not sure how checking the box kills results. The logic of the check box is implemented in a way that does not affect the search logic. The deltas are calculated after the fact. The actual SQL statement that creates the results is at the top of the page. That doesn't change. Still, I've seen unusual and unexpected things before. What are you seeing that has you thinking there's a difference in the search results?

131565 No.1341497

Never mind. The whole darn thing broke. I had overhauled the search logic to better support the data prep steps, and I guess stuff got messed up in the process. When I get done being disgusted about that, I'll fix it.

8dde8e No.1342663



I wonder if what Q is referring to is the Legal Status of the US., Macron brought a new contract to sign for Trump in conservatorship. That the old, legal status with the Rothschilds is no longer in effect due to bankruptcy.

131565 No.1381398


I have no idea how defined() can return FALSE and yet the value be correctly set. Anyway, the program has been fixed, I believe.

90e281 No.1445611

Hows it looking you faggots? Things progressing as designed?

I got a nagging image issue sorted out. Now archiving Q images and reference images to my site. Just about ready to get back on the elasticsearch idea.

131565 No.1446207


I have no idea what elasticsearch is. Would you care to explain?

I'm still working on things. At the moment, I'm adding some editing features to the research-tool version of things that I'd had in a prior tool. If you've noticed, older posts on my site have thumbnails and screenshots of links from the posts. And I've also started some work on the flagging feature so that I can feel better about putting all of the images back online rather than just selected ones.

90e281 No.1446391


Superfast multitenant full text search for json. Clients in Java, C#, PHP, Pyton, Apache Groovy, Ruby etc…

I think all I need to do is write something that will input all my json into my local elasticsearch instance and then all lights are for go.

7daa5d No.1506424

File: ffe64e00e6b4a69⋯.jpg (378.29 KB, 1200x900, 4:3, 1459522989864.jpg)

I've heard whispers of Q + Team posting at set time intervals

Worthwhile to investigate

How to visualize?

Side by side threads (yes, whole threads!) + time lines (with colours)

Helluva Job, No doubt, but who else to ask .. ?

6324a9 No.1529968

File: 8dd1a603b9caee8⋯.png (111.39 KB, 633x318, 211:106, Search suggestion.PNG)

Saw this on Qresearch and didn't know if it had any merit. Leave it to the experts.

90e281 No.1530201


MMmmm Yes I have. I like the idea.

There are many services out there that will allow you to do this or you can create your own blockchain w/ ethereum.

Were you thinking just qposts or all qresearch?

6324a9 No.1530470



>Were you thinking just qposts or all qresearch?


>My goal is to see ALL of the board searchable

Please see the first thumbnail in the OP, and the post referenced here. (I'm the OP)

2ffc90 No.1548098



htt ps://ste emit DOT com SLASH wikileaks/@ausbitbank/the-great-wizard-of-leaks-a-blockchain-fantasy-action-adventure-epic

131565 No.1591211


Just got back from vacation and saw this. My site can display Q posts and Trump tweets in the same search results in time order.


I just got back from vacation, so my archive is over a week behind at the moment. I should be more current in a few hours.

131565 No.1732451

Last night, anons were discussing the fact that the chans are part of history. Concern was expressed about the shill impacts on the boards and that perhaps there needed to be a cleaner view of it all. I suppose one answer could be to get back to the original purpose intended for the private version of my database, which is to identify what should be included in the blog that is in the root directory of the site. I haven't actually updated anything there in quite a while. Maybe it's time to get back to that.

90e281 No.1733098


Sounds like a good idea. Probably alot of work!

4ca1a6 No.1733192

>>1732671 (prev.)

I have heard estimates of Roth wealth in the area of 400-500 Trillion dollars.

131565 No.1733203


It HAS been a lot of work and will continue to be. I've been coasting for a bit, just making sure that the general threads have been archived and made available. But there's also a lot of processing to do with the data if the ultimate goal is to be achieved as imagined. Kinda wish there was a way to safely share the work.

90e281 No.1733870


I heard that. I coasted about 2 weeks for the same reason. I've been working on tightening up the site and working on small bugs I've found.

I implemented a search for Q posts and am working on the big bread search now.

131565 No.1734049

One of the tricky things about making my research tool available publicly is that the platforms are different. Different operating system, different database, and (apparently) different PHP. So I may have something working perfectly on my development machine, but I find there are problems when I try to share it. If the focus is to prepare the blog, which is an abridged view, then maybe I shouldn't sweat it if what I have shared publicly doesn't always work?

90e281 No.1735439


Ahh you've entered the big new world of internet interoperability! The internet is great, but it's not always the easiest to move data from platform to platform.

It's one of the reasons I stuck with straight JSON. Platform independent. Easily shared. Do you have the capability to transform into JSON/XML? What is your end goal? Share the database? Share the data? The app itself?

131565 No.1736163


I probably do. It's all databased. I'd just have to put stuff into a structure and run an encode_json() on it. Not sure it would be all that easy to put the advanced features into the JSON, though. It doesn't solve the problem of making something accessible for non-techie types, though, which is my goal.

90e281 No.1774255


Big bread search update.

965f24 No.1865219

File: 27a802435b8d244⋯.png (222.17 KB, 1330x741, 70:39, ss (2018-06-22 at 12.57.57….png)

http://YaCy.net – distributed search engine – has 17 hits for clean query {Q Clearance Patriot}. Kek.

But we should probably download the software and seed a lot moar…

965f24 No.1865264

File: d84d58488a73599⋯.png (156.88 KB, 1376x862, 688:431, download.png)


Just for kicks another search

131565 No.1865819


Certainly a page could be made for telling people how to search the original sources. Maybe it could include input fields as well to help people get it right. Unfortunately, original sources have been hacked from time to time, and some material is no longer available.

cabbaa No.1873487

Hi there anons, just stumbled on this thread in my search for a collection of notables.

Anyone thought of putting them together in a tread/breads?

What were/would be pros/cons of doing such?

Data duplication, Too big etc.

Are there easy ways to make/view/access such collection?

131565 No.1876296


My project has the capability of searching by threads.

As for breads, I'd been working toward that, and I'll probably get back to it soon. The challenge of breads is a bit tougher because they must be identified. So far, my own solution has been a combination of automation and inspection.

cabbaa No.1879714


Hey TY for getting back to me about this anon.

Your solution is similar to mine I see.

It is why I'd like to have a blogroll with exclusively notables, scraped from all breads by automation, so I could inspect the works thereof.

90e281 No.1963374


I may be back to Solr not being a good solution.

In trying to create a prebuilt index I've discovered that either

a) javascript just doesn't have enough memory to do it

b) javascript times out before it gets done and nothing happens.

I'm going to take a closer look at this


90e281 No.1971866


Moar testing today. Solr is NEVER going to work in this instance. I was hoping that I could just create an index on my dev machine and save that off and then use a worker process to add to the index. I've got one other idea to see if I can bend it to my will - but so far no workie. From what I can tell it's not possible to add to the index - it needs to be completely regenerated when you add a new document.

I don't understand how other people can add so many docs to the index and have it work. My tests were showing it to run for 12+ minutes just to generate an index and it never finished.

I'm open to new ideas if anybody has one.

The custom Google search I've got on there now does seem to work, but again it's not ideal. What I want is a list of POSTS that match and the goog search seems to find the matches, but only returns complete breads. You still have to CRTL F to find what you were looking for within the bread.

I can put together a test harness for Solr if anybody want to see if they can figure out a way to make it go.

90e281 No.1972297


My gut is telling me that my next best option is to move into a database in order to accomplish the bigbreadsearch. It's probably possible to do using a hosted elasticsearch solution (https://www.elastic.co/cloud @$50/mo)

On the other hand, I think that I can write an app to fill a database in a couple hours, and it would solve a few of the problems I was seeing in the other search tech. Most of the good search engines will plug into a database anyways so I think this is probably the direction I'm headed.

131565 No.1993030


$50/month seems like a lot. My cost isn't nearly that much.

90e281 No.1997608


For elastic search?

90e281 No.1997626


>$50/month seems like a lot. My cost isn't nearly that much.

Derp. I clicked the wrong post.

I agree - which is why I haven't done anything on it. My hosting costs a bit more than that - ANNUALLY.

I feel like a DB is just just going to be a better solution now. I'd hoped that I'd be able to just do everything with straight JSON - but alas! You cannot.

I guess I need to find the best search engine to plug a DB into now. I'm hoping to write the code to insert my existing data into the database today, write code to insert new data into the DB tomorrow.

131565 No.2004075


That sounds like a software lease.

131565 No.2004084


MySQL and MariaDB have a natural language search capability built into it. Have you checked to see if that meets your needs?

e8d48e No.2009885

File: fef4a8d9dda3055⋯.jpg (35.89 KB, 300x300, 1:1, QSEARCH.jpg)


Anons might find this useful.

Doesn't work so well with images but is good for keyword searches.

90e281 No.2013368


Yeah. It's a hosted service. It appears that deploying a custom elasticsearch is probably a large pain in the ass most folks don't want to deal with.


I have SQLServer currently set up and my host gives me a database so I'll probably go with that.


WTFERK? We already have like 3 bread searches already now? Am I totally wasting my time?


Interesting! Tell me more about how you are doing this. Search seems to be pretty quick. Are you using a DB backend? Straight text search? Is all this in PHP?

I've managed to import all the JSON data I have on hand. 1,569,777 posts took 25mins to import. My DB design is ultra simple. Single table that virtually matches the JSON data structure. There's no telling what the performance is going to be like just yet. Even getting a count takes 16 seconds. Ugh.

I'll run some simple tests later to see what I can figure out.

[Return][Go to top][Catalog][Nerve Center][Cancer][Post a Reply]
[ / / / / / / / / / / / / / ] [ dir / agatha / ameta / arepa / asmr / baphomet / bflo / ck / general ]