[ / / / / / / / / / / / / / ] [ dir / random / 93 / biohzrd / hkacade / hkpnd / tct / utd / uy / yebalnia ]

/g/ - Technology

Make /g/ Great Again

Name
Email
Subject
REC
STOP
Comment *
File
Password (Randomized for file and post deletion; you may also set your own.)
Archive
* = required field[▶Show post options & limits]
Confused? See the FAQ.
Embed
(replaces files and can be used instead)
Options
dicesidesmodifier

Allowed file types:jpg, jpeg, gif, png, webp,webm, mp4, mov, swf, pdf
Max filesize is16 MB.
Max image dimensions are15000 x15000.
You may upload5 per post.


File: 255c14e0935be57⋯.png (9.33 KB,631x237,631:237,logo_searx_a.png)

 No.9637

everyone seems to be shilling either duckduckgo or startpage, but nobody really pays attention to searx. it's an open source metasearch engine which you can host yourself or use with a public instance.

https://github.com/asciimoo/searx

____________________________
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

 No.9661

>>9637

It looks like I could run my search engine on darknet by using its source code.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

 No.9663

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

 No.9671

the developer Adam Tauber uses a picture of Sonic the Hedgehog on github

>dropped

ill stick with startpage

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

 No.9672

>>9671

the avatar clearly makes everything better

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

 No.9673

>>9671

> i don't like the avatar so the project is bad

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

 No.9681

There are a few public instances that work generally well. There's https://searx.me and https://050.ca

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

 No.9682

>>9663

https://searx.neocities.org/

I stopped using that a while ago because the instance list they were using was radically out of date, but then I checked back and it was fixed.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

 No.9701

>>9682

looks good, but there's no warranty over logs being collected or not

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

 No.9829

Public instances seem to be having trouble getting results from Google for the most part. Self hosting works well so far

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

 No.9844

File: f3205dbed1518b1⋯.png (342.24 KB,423x450,47:50,shiggydiggy.png)

>>9671

>he doesn't want to go fast

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

 No.9850

I use DDG Lite with bangs, which is very fast and i can use startpage for images and other stuff ddg can't handle. It also works very nice with terminal-based browsers.

Searx looks very good until you see how slow and inaccurate it is. Has only has a couple of bangs and the image search always bitches about the google crash. May as well use startpage.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

 No.9851

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

 No.9852

File: 46e976ffc23c26b⋯.png (96.08 KB,1664x915,1664:915,ClipboardImage.png)

>>9851

Looks interesting. What happens when big G forces captchas down your throat?

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

 No.9861

>>9850

>slow and inaccurate

i mean, it's not like there's over a hundred unlisted instances out there that aren't slow and pozzed with google captcha. the public instances on github are sucky.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

 No.9997

Self host searx on a docker and chill.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

 No.10045

>lul searx just use ddg or startpage

>never looked at the SearX settings

>doesn't know what a meta-search engine is or means

SearX is searches other search engines, like DDG (that stands for DuckDuckGo, a more private front end for Yandex or Yahoo whichever they are using now) or StartPage/IxQuick (a more private front end for Goolag).

So why in the heck trust these sites with your data directly when you can use an extra layer of privacy, and search two (or more) at the same time?

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

 No.10101

File: e67621cd1d6002a⋯.png (44.89 KB,958x887,958:887,ClipboardImage.png)

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

 No.10737

>>9997

yeah, only takes 10 minutes to set up.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

 No.10802

File: 325631b8bd060e8⋯.png (219.89 KB,1080x1920,9:16,Screenshot_20190727-171841.png)

Searx is just Google. Can't work without secret Google capcha pass

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

 No.10803

>>10802

other searx instances work fine with google.

you can also change the search engines used in preferences.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

 No.10823

I was using searx.me and searx.xyz for a long time but they stopped working so i had to switch to startpage. But i found searx.be from here https://stats.searx.xyz/

and it seems to be the best one in terms of SSL, CSP and Response Time. Only downside is it doesnt have a proxy option for websites. Other than that it just works. For general i use Wikipedia, Wikisource, Ddg definitions, Duckduckgo and Google btw. I would use Startpage instead of Google and also Yandex but both seem to be just giving results for the first page.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

 No.10826

File: e059099c1ddeaba⋯.png (614.02 KB,846x938,423:469,1561058712076.png)

>>10802

You're a fucking idiot.

>preferences

>engines

>toggle duckduckgo

basically what >>10045 said

Also

>using brave

God, you're a faggot

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

 No.12352

>>9637

I'll hold off on creating a thread for this yet because maybe people here can answer my questions.

We all know that Google is shit. It didn't used to be shit. It used to show sites based moreso on your search terms, but now the results are based almost entirely on popularity, and it's nearly impossible to find certain low-profile sites using Google. What's more, as far as I can tell, literally every other search engine refers to Google's databases for it's results. That part could be wrong, and if anyone knows of a breakdown of which search engines use which website databases (or just a list of the unique databases), I would appreciate it, but as far as I can tell, they all use Google. Please give me a counterexample and I will try out its algorithm, but I suspect it will show similar results to Google anyway.

Otherwise, a database that can provide the sort of esoteric results common on 1990s internet, would be more fun to use, probably more helpful too, and as Google continues to tighten it's algorithm and purge its database accordingly, I think there will eventually be an opening for a new search engine to fill a sought-after niche (namely, being good) that Google is committed to neglecting.

Perhaps search engines are shit now because they there are more websites, and so they can't keep databases of website data for every website like they used to (if they were ever doing that), but now we have machine learning algorithms like word2vec and paragraph2vec, which suggest that there is a way of learning an identifying vector for each website (site2vec?), allowing website data to not only be compressed into a relatively small identifying vector, but would also allow cool new vector-based search criteria. Word2vec learns a vector for each word training set documents that appear to have actual vector properties representing the meaning of the words. So e.g., if V(word) is the vector for a given word, then scientists find that doing something like V(paris)-V(france)+V(Italy) is very close to V(rome), and V(king)-V(man)+V(woman) is very close to V(queen)...

So now imagine if every website were represented by a vector. You could use the crazy vector properties of site vectors to find webpages based purely on your criteria. Say we take chan boards as basis vectors with different "personalities". What would be the result of searching for V(brietbart)-V(/pol/)+V(/x/)? Or how about V(youtube)-V(/b/)+V(/sci/). But of course there could also be a way to search the vectors based on content by first turning your search phrase into a vector suitable for comparison, V(phrase), and looking for a closest projection from the V(phrase) to V(site) for all sites in the database.

So anyways, what would it take to do this? What does it take to host a search engine? I'll try to learn about this. I'm proficient in HTML/JS and know some PHP.

Storage space: I've read there are 1.5 billion new sites launched per day. How big of a vector (bit dimensions) does it take to decompose a website by category (it doesn't have to be unique; say each vector is associated with 10 sites)? 1.5 billion sites is 150 million vectors...that's only 27 bits to distinguish between each vector, so let's say 30 bits per vector. 30 bits times 1.5 billion sites is ~6GB. That's 6GB to remember the entire internet.

Processing: Obviously the hard part would be actually learning these vectors. You would have to keep up with the 140k websites launched per day, though you could certainly whittle this down a lot if you restricted yourself to, say, websites hosted in a particular country, or on a particular topic, or websites that have been online for over a year. Ultimately you may not need to process many more websites than there are total currently, so 1.5 billion. I have no idea how long this would take. I haven't even learned the word2vec algorithm yet.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

 No.12353

>>12352

>So anyways, what would it take to do this? What does it take to host a search engine? I'll try to learn about this. I'm proficient in HTML/JS and know some PHP.

Yeah, you're definitely not the person to create a search engine of any kind.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

 No.12354

>>12353

If you actually knew more than I did about this topic, you would provide some context for why it's impossible, but you can't even do that, so you're obviously not the sort of person who can entertain such an idea in the first place.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

 No.12355

>>12352

Database suck, use nosql and no database.

As for the search results, I think even bing makes better image results than google now.

Google used to offer the best results back in 2009+- but now it's shit that I don't even use it anymore.

First you need to set up a crawler, like spider or botnet. Make one your own but I'm no expert, then second it must not get blocked by cuckflare (One more step..) or other shitty detection tool.

From what I've heard the <meta> tags are where your crawlers should be looking but the problem is there's plenty of those fake result that abuse the meta tags (like when searching for a Mp3 file and the top 10 list don't even have the fucking thing but instead would offer you a virus adware toolbar).

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

 No.12375

>>12355

Thank you. Making a crawler is definitely a good place to start. It didn't occur that there are services that would try to "protect" websites from my web crawler. Regarding metatag abuse, if I can actually apply ML to that metadata, then it should be possible to learn the signature of dubious sites, even if my manually applying cluster analysis to the resulting site vectors.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

 No.12384

>>10802

Why would you use jewgle instead of a real search engine?

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

 No.12792

File: ffbab088a6d15c1⋯.png (76.5 KB,620x237,620:237,taydidnothingwrong.png)

>>12352

>scientists find that doing something like V(paris)-V(france)+V(Italy) is very close to V(rome), and V(king)-V(man)+V(woman) is very close to V(queen)...

Probably not totally different from what google does under the hood, but to be able to be explicit about it could be killer. Imagine being able to categorize "normienigger" content so that when you want to know about some political topic you're not bombarded with retard gibberish from buzzfeed and its ilk. Use -(V)normie to filter out social media, or -(V)soy to filter out pop culture— not necessarily all of it, but the obvious prolefeed. Imagine having a way to tell youtube "I need videos about V('puters n' sheeeeit) but don't want any V(streetshitters) speaking their unintelligible patois." Imagine google image search, except not subverted to work incorrectly.

Imagine being the dude who returned truth and beauty to the internet, just by offering a service that isn't fucked up or retarded by design.

>I'm proficient in HTML/JS

From what I see you can do just about anything with JS these days. There's bound to be a Tensorflow for it, or a translation layer to python.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

 No.12793

SearX is a meme, if you use a public instance you're trusting a random person, if you host your own instance then it's the same as searching google directly from an IP you own, just use duckduckgo, if you don't trust them then use their .onion service so they can't track you even if they wanted to

https://3g2upl4pq6kufc4m.onion/

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

 No.12939

>>12793

searx allows searching from multiple search engines simultaneously and gets rid of the tracking urls search engines typically bundle. if you selfhost on a vps and share it with multiple people, you get to blend the traffic with others and avoid using your home ip. searx fits perfectly the threat model for average joe.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

 No.12983

Has anyone got this working properly on Dillo or/and w3m? I can search just fine but the cookies are broken for some reason. As in, they are generated wrongly/not conforming to my configuration.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.



[Return][Go to top][Catalog][Nerve Center][Random][Post a Reply]
Delete Post [ ]
[]
[ / / / / / / / / / / / / / ] [ dir / random / 93 / biohzrd / hkacade / hkpnd / tct / utd / uy / yebalnia ]