No.9637
everyone seems to be shilling either duckduckgo or startpage, but nobody really pays attention to searx. it's an open source metasearch engine which you can host yourself or use with a public instance.
https://github.com/asciimoo/searx
____________________________
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.9661
>>9637
It looks like I could run my search engine on darknet by using its source code.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.9663
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.9671
the developer Adam Tauber uses a picture of Sonic the Hedgehog on github
>dropped
ill stick with startpage
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.9672
>>9671
the avatar clearly makes everything better
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.9673
>>9671
> i don't like the avatar so the project is bad
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.9681
There are a few public instances that work generally well. There's https://searx.me and https://050.ca
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.9682
>>9663
https://searx.neocities.org/
I stopped using that a while ago because the instance list they were using was radically out of date, but then I checked back and it was fixed.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.9701
>>9682
looks good, but there's no warranty over logs being collected or not
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.9829
Public instances seem to be having trouble getting results from Google for the most part. Self hosting works well so far
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.9844
>>9671
>he doesn't want to go fast
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.9850
I use DDG Lite with bangs, which is very fast and i can use startpage for images and other stuff ddg can't handle. It also works very nice with terminal-based browsers.
Searx looks very good until you see how slow and inaccurate it is. Has only has a couple of bangs and the image search always bitches about the google crash. May as well use startpage.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.9851
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.9852
>>9851
Looks interesting. What happens when big G forces captchas down your throat?
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.9861
>>9850
>slow and inaccurate
i mean, it's not like there's over a hundred unlisted instances out there that aren't slow and pozzed with google captcha. the public instances on github are sucky.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.9997
Self host searx on a docker and chill.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.10045
>lul searx just use ddg or startpage
>never looked at the SearX settings
>doesn't know what a meta-search engine is or means
SearX is searches other search engines, like DDG (that stands for DuckDuckGo, a more private front end for Yandex or Yahoo whichever they are using now) or StartPage/IxQuick (a more private front end for Goolag).
So why in the heck trust these sites with your data directly when you can use an extra layer of privacy, and search two (or more) at the same time?
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.10101
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.10737
>>9997
yeah, only takes 10 minutes to set up.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.10802
Searx is just Google. Can't work without secret Google capcha pass
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.10803
>>10802
other searx instances work fine with google.
you can also change the search engines used in preferences.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.10823
I was using searx.me and searx.xyz for a long time but they stopped working so i had to switch to startpage. But i found searx.be from here https://stats.searx.xyz/
and it seems to be the best one in terms of SSL, CSP and Response Time. Only downside is it doesnt have a proxy option for websites. Other than that it just works. For general i use Wikipedia, Wikisource, Ddg definitions, Duckduckgo and Google btw. I would use Startpage instead of Google and also Yandex but both seem to be just giving results for the first page.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.10826
>>10802
You're a fucking idiot.
>preferences
>engines
>toggle duckduckgo
basically what >>10045 said
Also
>using brave
God, you're a faggot
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.12352
>>9637
I'll hold off on creating a thread for this yet because maybe people here can answer my questions.
We all know that Google is shit. It didn't used to be shit. It used to show sites based moreso on your search terms, but now the results are based almost entirely on popularity, and it's nearly impossible to find certain low-profile sites using Google. What's more, as far as I can tell, literally every other search engine refers to Google's databases for it's results. That part could be wrong, and if anyone knows of a breakdown of which search engines use which website databases (or just a list of the unique databases), I would appreciate it, but as far as I can tell, they all use Google. Please give me a counterexample and I will try out its algorithm, but I suspect it will show similar results to Google anyway.
Otherwise, a database that can provide the sort of esoteric results common on 1990s internet, would be more fun to use, probably more helpful too, and as Google continues to tighten it's algorithm and purge its database accordingly, I think there will eventually be an opening for a new search engine to fill a sought-after niche (namely, being good) that Google is committed to neglecting.
Perhaps search engines are shit now because they there are more websites, and so they can't keep databases of website data for every website like they used to (if they were ever doing that), but now we have machine learning algorithms like word2vec and paragraph2vec, which suggest that there is a way of learning an identifying vector for each website (site2vec?), allowing website data to not only be compressed into a relatively small identifying vector, but would also allow cool new vector-based search criteria. Word2vec learns a vector for each word training set documents that appear to have actual vector properties representing the meaning of the words. So e.g., if V(word) is the vector for a given word, then scientists find that doing something like V(paris)-V(france)+V(Italy) is very close to V(rome), and V(king)-V(man)+V(woman) is very close to V(queen)...
So now imagine if every website were represented by a vector. You could use the crazy vector properties of site vectors to find webpages based purely on your criteria. Say we take chan boards as basis vectors with different "personalities". What would be the result of searching for V(brietbart)-V(/pol/)+V(/x/)? Or how about V(youtube)-V(/b/)+V(/sci/). But of course there could also be a way to search the vectors based on content by first turning your search phrase into a vector suitable for comparison, V(phrase), and looking for a closest projection from the V(phrase) to V(site) for all sites in the database.
So anyways, what would it take to do this? What does it take to host a search engine? I'll try to learn about this. I'm proficient in HTML/JS and know some PHP.
Storage space: I've read there are 1.5 billion new sites launched per day. How big of a vector (bit dimensions) does it take to decompose a website by category (it doesn't have to be unique; say each vector is associated with 10 sites)? 1.5 billion sites is 150 million vectors...that's only 27 bits to distinguish between each vector, so let's say 30 bits per vector. 30 bits times 1.5 billion sites is ~6GB. That's 6GB to remember the entire internet.
Processing: Obviously the hard part would be actually learning these vectors. You would have to keep up with the 140k websites launched per day, though you could certainly whittle this down a lot if you restricted yourself to, say, websites hosted in a particular country, or on a particular topic, or websites that have been online for over a year. Ultimately you may not need to process many more websites than there are total currently, so 1.5 billion. I have no idea how long this would take. I haven't even learned the word2vec algorithm yet.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.12353
>>12352
>So anyways, what would it take to do this? What does it take to host a search engine? I'll try to learn about this. I'm proficient in HTML/JS and know some PHP.
Yeah, you're definitely not the person to create a search engine of any kind.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.12354
>>12353
If you actually knew more than I did about this topic, you would provide some context for why it's impossible, but you can't even do that, so you're obviously not the sort of person who can entertain such an idea in the first place.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.12355
>>12352
Database suck, use nosql and no database.
As for the search results, I think even bing makes better image results than google now.
Google used to offer the best results back in 2009+- but now it's shit that I don't even use it anymore.
First you need to set up a crawler, like spider or botnet. Make one your own but I'm no expert, then second it must not get blocked by cuckflare (One more step..) or other shitty detection tool.
From what I've heard the <meta> tags are where your crawlers should be looking but the problem is there's plenty of those fake result that abuse the meta tags (like when searching for a Mp3 file and the top 10 list don't even have the fucking thing but instead would offer you a virus adware toolbar).
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.12375
>>12355
Thank you. Making a crawler is definitely a good place to start. It didn't occur that there are services that would try to "protect" websites from my web crawler. Regarding metatag abuse, if I can actually apply ML to that metadata, then it should be possible to learn the signature of dubious sites, even if my manually applying cluster analysis to the resulting site vectors.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.12384
>>10802
Why would you use jewgle instead of a real search engine?
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.12792
>>12352
>scientists find that doing something like V(paris)-V(france)+V(Italy) is very close to V(rome), and V(king)-V(man)+V(woman) is very close to V(queen)...
Probably not totally different from what google does under the hood, but to be able to be explicit about it could be killer. Imagine being able to categorize "normienigger" content so that when you want to know about some political topic you're not bombarded with retard gibberish from buzzfeed and its ilk. Use -(V)normie to filter out social media, or -(V)soy to filter out pop culture— not necessarily all of it, but the obvious prolefeed. Imagine having a way to tell youtube "I need videos about V('puters n' sheeeeit) but don't want any V(streetshitters) speaking their unintelligible patois." Imagine google image search, except not subverted to work incorrectly.
Imagine being the dude who returned truth and beauty to the internet, just by offering a service that isn't fucked up or retarded by design.
>I'm proficient in HTML/JS
From what I see you can do just about anything with JS these days. There's bound to be a Tensorflow for it, or a translation layer to python.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.12793
SearX is a meme, if you use a public instance you're trusting a random person, if you host your own instance then it's the same as searching google directly from an IP you own, just use duckduckgo, if you don't trust them then use their .onion service so they can't track you even if they wanted to
https://3g2upl4pq6kufc4m.onion/
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.12939
>>12793
searx allows searching from multiple search engines simultaneously and gets rid of the tracking urls search engines typically bundle. if you selfhost on a vps and share it with multiple people, you get to blend the traffic with others and avoid using your home ip. searx fits perfectly the threat model for average joe.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
No.12983
Has anyone got this working properly on Dillo or/and w3m? I can search just fine but the cookies are broken for some reason. As in, they are generated wrongly/not conforming to my configuration.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.