>>2196679
Search relevancy is the secret sauce behind a search. It's what make some better than others.
Have tried any of the search tools linked above?
Here's what I discovered working on the search. It was all in the old bread search thread.
On my site the current archive search covers everything on my site and uses the googl algos to determine relevancy. You could read up on how everybody thinks is done, but goog never says. It returns reasonably accurate results, but only as full breads, not as individual posts.
The Q posts search on my site uses a local install of Lunr. It searches ONLY the q drops and can be tweaked to provide better results. The Lunr search is straight text, no images/image names etc. Post name, Subject, Dates, trips, and post text are all included in the index. Search results are returned as individual posts.
I was trying to build out a search that could be local to my archive, but with tweakable results returned in individual posts. In doing that I ran into several problems:
1) The index MUST be prebuilt. My archive has nearly 3000 breads. Due to the way the archive is pure json (no current database), each bread must be read in. My tests showed that it took about 5-10 minutes to read everything in depending on system load.
2) Search results from the engines I looked into (Lunr, Solr, elasticsearch, Sphinx, Lucerne etc) all return results with the ID of the result + rank, not the actual result. You have to code it to return the results themselves. This opened up other issues in using the API for this. 5000 resul post id's would ultimately be bassed via the querystring into the API. Not possible. The querystring can't be that long. Fail.
3) There is a limit in the size of the json data than can be effectively sent to a client for js use. 10MB seems to be the upper limit without killing off the browser. 3000 breads * 751 posts = 2,253,000 total posts. I figure 10MB of results is about 5,000 results avg. The current breadarchive.json is about 10MB due to json being a text based key/value object format. You can see it in action with this search on 'awan'
qanon.news/api/smash?search=awan&xml=true Result is in xml for magical readibility. This is a stright text search of the text node only, no Lunr, no database.
4) A javascript search is therefore never going to work because there is simply too much data. (Lunr is out) The search must be done with a better server based technology (Solr, elasticsearch, Sphinx, Lucerne etc). This issue then is having to install it as a service which requires dedicated hosting which comes with a more substantial pricetag. $30-50/month. My current hosting cost me $12/year. Unfortunately I'm not richanon.
5) Getting everything into a database and using plain text search would move us towards a better solution, but it has it's own issues. DB size, DB platform, and result accuracy. I'm pretty sure the other 2 searches linked above are using this technology. The big search engines will all plug into pretty much any database available or even json.
Everything indicates we've got to use a real search engine to do what we want.
So all this leads me to a couple different conclusions. Yeah I can do a search using a big engine - locally.
So for me I've got 2 options. I can set up to take donations/ads and try and raise enough money to finance the hosting needed to provide the search.
OR
Anons can set up sacrificial lamb server(s) on their home network available to the internet that has one of the big engines on it. Several of the big engines support distributed type topology so we could have say 5 different search servers to provide redundancy/speed. 1 is always a single point of failure. The price is right, but it does come with some risks.
Until I can resolve this engine/hosting issue it's a no go for me. As usual it boils down to money. I've got ideas, know-how and time, but not much in the way of financial wealth.
I think I follow what you are asking about, but tell me more. I'm open to all ideas.