[ / / / / / / / / / / / / / ] [ dir / abc / animu / bestemma / fascist / fast / just / omnichan / vichan ][Options][ watchlist ]

/tech/ - Technology

You can now write text to your AI-generated image at https://aiproto.com It is currently free to use for Proto members.
Email
Comment *
File
Select/drop/paste files here
Password (Randomized for file and post deletion; you may also set your own.)
* = required field[▶ Show post options & limits]
Confused? See the FAQ.
Expand all images

File (hide): 34104526ba5ee63⋯.png (284.32 KB, 512x512, 1:1, world database.png) (h) (u)

[–]

 No.959980>>960141 >>961652 [Watch Thread][Show All Posts]

Due to the unreliable nature of information on the Internet I've been thinking about something that I haven't seen mentioned anywhere: a decentralized universal database of everything. Is there a project attempting to do such a thing?

The idea here is to fill it with data from all around. It will not be a Wikipedia though it will surely link to it quite a lot. If a datapoint (say, a URL or an article) has conflicting versions then all of those can be collected but the validation system will ensure noone can really store and have propagated to the nodes a fake quote about someone.

Basically the tenets would be along these lines:

* Mostly decentralized with some servers acting as coordinators to reduce the amount of spam/crap. If those main servers go down the network can still function.

* Many UIs both native and web easy enough for normies not to fuck up anything other than the info they pump into the DB.

* A system of reviews/validation in which all the community can rate the trustfulness of the data.

* Sharding all the DB in managable chunks. Nodes can choose how many GB of data they want to store and how much bandwidth they're willing to provide.

* Maybe use blockchain to validate new blocks of data ingested into the system.

* Hashes of all media stored on it or referenced to it.

* Autochecks for major websites. A stored tweet will automatically get compared to the one stored on Twitter's servers. Same for Facebook, Youtube, Wikipedia, etc.

Anyway, I hope you get the gist of it. Of course I'm already running into many problems such as my lack of knowledge in databases and the time it will require to create all this (not to mention that I'm almost a pajeet in programming proficiency). Scalability is also a huge issue that I have no clue how to overcome.

I know there are some technologies that will really help create this universal database such as IPFS and torrents to propagate confirmed chunks. Maybe even stuff like Storj or Ethereum to refer to in case of sabotage (i.e.: check for hashes periodically stored in Ethereum in case of conflicting versions).

The rationale behind this is I'm fucking sick of not being able to have a basic semblance of confidence every time I read something on the Internet.

I know there's stuff like Everypedia but that's not what I'm aiming for here. This is not an encyclopedia. This is a reference for all (politically) relevant datapoints that is guaranteed with a decent level of confidence not to have been tampered with since it was stored and if there's crap stored on it it was rated as such.

 No.959983>>959994

Everytime some retard "Ideaguy" like you talks about shit like this they are utterly incompetent. The people that can actually code and do practical shit could (and should be able to) think your shitty thread up in 2 seconds, it's the actually doing it that's HARD. Go grab a book and start implementing that shit yourself. Dickhead.


 No.959994>>959997 >>960373

>>959983

>Go grab a book

Which one?


 No.959997>>960001

>>959994

Depends on what you currently know and what you want knowledge you want to end up with. Just pick one up that looks interesting, if it sucks so be it, get another one, if it's good that's good for you. All books are available for free, I still can't understand how people can consciously decide not to read when they clearly have an interest. Use the internet for fucks sake, it's probably your greatest asset at the stage you're at right now.


 No.960001

>>959997

I'm constantly reading. And I'm aware implementing is the hard part. I'm also aware this will be a years long project. I'm also willing to pay someone for the parts I don't care about learning how to implement. But I don't really know where to start.

What kind of DB would you use? Would you go for SQL or NoSQL? The only thing that comes closely to what I'm picturing is something Apache but even there there's plenty of DB systems.

Should I learn Cassandra as a starting point? I'm already familiar with SQL but I don't think it will be that useful here. I don't want to start pumping stuff into a random MySQL DB only to realize down the road it was never the proper tool for this task.


 No.960024>>960029 >>961685

OP everything you want can be done with IPFS, except the automatic caching of webcontent. Someone needs to make a webcrawler that crawls and stores webpages as IPFS/IPNS adresses.


 No.960027>>960029

We used to talk about it in the early '90s in tech before the web caught on. The idea would be that you could send out 'agents' (small pieces of code) that could look through another computer's files for desired information and report back. You'd be charged for the resources used by the agent on the machine it runs on and you'd pay the machine's owner. So you could do a global search of any complexity with enough money. Unsurprisingly, the professor I worked with on this for a bit was Jew and everything revolved around how to implement decentralized payment. You could probably dig up those old CS papers and have a look since with buttcoins we now have the missing piece.


 No.960029

>>960024

This project will probably have IPFS as a key component but it will also contain metadata about files or resources too big to cache completely. The ratings systems -which will help decide if something is true or fake- is also not implemented in IPFS as far as I know.

But yeah, maybe what I'm thinking of can just be extended from IPFS instead of being a standalone project.

>>960027

Any idea how can I look for those? Any keywords or names will be appreciated.


 No.960060>>960069

sounds like IPFS


 No.960066>>960069

You just described exactly what Freenet is.

/thread


 No.960068>>960069

why does it have to be universal and database? fuck you


 No.960069>>960072 >>960139

>>960060

>>960066

Do you understand what a database is or am I failing in my explanations?

Please show me how can I query all of Trump quotes said between 2003 and 2008 with references to all of them in IPFS or Freenet.

>>960068

It doesn't have to be a database. Open to suggestions. Anything to share?

It has to be universal because it will be open to anybody excepting malicious actors.


 No.960072>>960080

>>960069

>show me how can I query

Easy. Have the person who initially stores the tweets create an index mapping each tweet to metadata (like timestamp). You can then use the index to query the info. It's not possible to directly query information because data is identified by only its hash. Hash based databases are key value databases.

What you really want in your database is an index of tag metadata for each file. At that point it all basically becomes a JSON datastore.


 No.960080>>960107

>>960072

Right, but what I mean is that I can't do that right now querying IPFS because it lacks such a function. And that's the kind of functionality I'm looking to create with this universal database. As I understand it IPFS doesn't contain a queriable database of atomic datapoints related to non-automatable metadata.

I don't want to reinvent the wheel though so thanks for your input.


 No.960107

>>960080

>I can't do that right now querying IPFS because it lacks such a function

I just told you how to do it. All you need to do is create a metadata index of files you add.

>atomic datapoints related to non-automatable metadata

Nigger you have no clue what you're talking about.


 No.960128

Bots galore.


 No.960139>>960142

>>960069

>Please show me how can I query all of Trump quotes said between 2003 and 2008 with references to all of them

So let's say the database you envision is complete. You have access to it and this information is what you want, all Trump quotes said between 2003 and 2008 with references to all of them. Is that the exact query you enter into your database? If not, what is? And what do you expect the database to return to you?


 No.960141

>>959980 (OP)

Wouldn't it be much easier to start from the torrent format and just extend it to work better in cases where you want to work with a local torrent without having to copy the whole thing and implement a way to authenticate versions of files that superseed each other ala packages?


 No.960142

>>960139

I just figure out a way for OP to do that. Use IPFS to store the content. But then use >>>/hydra/ to sort the content. For example the trump quotes. You would use a giant webcrawler to obtain the data of the internet, which you would then feed into hydra which automatically tags and sorts it, such as by trump quotes, and then store it using IPFS.

Now this is a gigantic undertaking as you would have to crawl alot of pages, almost NSA levels of storage, to store the initial IPFS hash. You would have to make your own IPFS CDN, or several, based on content. Such as a IPFS CDN for trump quotes, or one for cook books, and etc. With hydra automatically sorting it all you have to do is download the hydra metadata files and look for what you want. Then download it using IPFS.


 No.960143>>960145 >>960161

>>60142

Whoops I fail. It is called >>>/hydrus/ and not hydra. The developer of hydrus is doing it as a pet project though, and it is meant to sort anything file wise. From their website




The hydrus network client is a desktop application written for Anonymous and other internet-enthusiasts who have large media collections. It organises your files into an internal database and browses them with tags instead of folders, a little like a *booru on your desktop. Tags and files can be anonymously shared through custom servers that any user may run. Everything is free, nothing phones home, and the source code is included with the release. It is developed mostly for Windows, but reasonably functional builds for Linux and OS X are available.
Currently importable filetypes are:

images - jpg, gif (including animated), png (including animated!) and bmp
audio - mp3, flac, ogg and wma
video - webm, mp4, mpeg, flv and wmv
misc - swf, pdf, zip, rar, 7z

I am sure it supports text too since it supports pdf.


 No.960145>>960146

>>960143

So you could webcrawl all of trump's twitter history and sort it into a tag called "trump_qoutes" for example. Then upload all that history to IPFS and upload the hydrus metadata file with it. All the user has to do is install ipfs or a browser addon for ipfs. Then install hydrus and download the hydrus database file for trump quotes.


 No.960146>>960161

>>960145

But it gets better because you aren't limited to one tag. Say you wanted to sort trump quotes by type. You could tag it "trump_quotes" and also make a tag for "funny" and "random" and "emotional" or whatever else you want. This also sorts images too so you could sort those into the mix along with all the files hydrus supports.

You have a problem though, you need the storage space for downloading all those files with trump_quotes in them. You could sidestep this by making a program that hashes the trump_quote file, such as text or imagery, and then when a user requests the hash from IPFS it downloads it from the clearnet on the fly. That would be exchanging storage space for internet bandwidth and speed. That is ok as long as you don't have more users then your network connection can handle wanting different things at the same time.

Your big problem is getting enough people interested in "trump_quotes" to download the hydrus metadata file from ipfs and then download the quote itself. Since ipfs distributes the data you no longer have to host the quote itself, you can just network link to it as a fallback CDN. But you need people to use it on something they care about, like porn, for it to be shared in the first place.

I could see this working for very popular niches, oxymoron I know. Since they have to be technically apt or you have to bundle IPFS+hydrus+GUI in a .exe for normalfags to even use it.


 No.960160

You want a futuristic database, with infinite depth and instant querying?

Sounds. nice, but don't you think an app that lets you send nudes and funny pictures in in realtime is more interesting?


 No.960161

>>960143

>>960146

Thanks for all the info. I'm really happy someone has been working on it for some time now.

>You have a problem though, you need the storage space for downloading all those files with trump_quotes in them.

Here's the thing, I'm not interested in hosting video or images in this database. I'm mostly interested in text and text is easily compressible. I bet all of Trump's tweets can fit in a 7zipped floppy disk.

>I could see this working for very popular niches, oxymoron I know.

Those exist and are willing to host content for free.

>or you have to bundle IPFS+hydrus+GUI in a .exe for normalfags to even use it.

Gonna study about hydrus now.


 No.960201>>960425

File (hide): 6e180018581030c⋯.gif (1.97 MB, 380x285, 4:3, 1410274399710.gif) (h) (u)

>I want to build a spaceship

>I know better how to make one

>Someone tell me how to do it

>let me use jargon that I don't understand

That larp...


 No.960251>>960425

>"sharding"

>blockchain

>IPFS

lol


 No.960373>>960425

>>959994

"Cocksucking For Retards"


 No.960425>>960448

>>960201

>>960251

>>960373

Cut him some slack, he seems willing to learn which is better than most.

<inb4 "white knight", I'll use your ass as my spoils of war faggot

OP, check out https://archive.org/ too, they've probably already asked and answered some of your same questions.


 No.960448

>>960425

I guess they first need to implement the Internet Archive to IPFS thingie: https://www.archiveteam.org/index.php?title=INTERNETARCHIVE.BAK

For the time being I think I'll focus on hydrus. It has a lot of what I want even if currently buggy or feature incomplete. Its dev also updates it a more than reasonable pace, which is great.


 No.961652

>>959980 (OP)

There's a project called "tauchain" that approximates what you are aiming at. It's not just data, but logic. It uses blockchain for consensus and to incentivize its population. I haven't checked the status of the project in years, but a quick google search shows that its been rolled into one of those generic crypto coin ICO-style websites.


 No.961685

>>960024

Even a few years ago I saw multiple projects based around donating your node to a collective for implicit distribution. IPFS nodes are easy to control and have multiple methods of connecting peers directly or via pubsub groups, publicly via the dht and privately via libp2p sockets.

You could easily build a Freenet style distribution system around IPFS and its pubsub system.

Publish a stream of hashes with low peercounts, have nodes subscribe to this and do some evaluation to determine if they should store it themselves. Most likely based around local metrics.

>new hash posted

>locally resolve peercount, if it's low proceed

>do geoip resolution on peers to see if it would be worth storing in this region, distribute evenly globally

Or really you could do it however you wanted.

I have a feeling that people are going to fork the filecoin project for this purpose. It would give you everything you need to coordinate storing, and verifying that data is actually being stored and distributed, but you could remove the tokens/cost and have your own metrics around choosing nodes (instead of cheapest bidder).




[Return][Go to top][Catalog][Screencap][Nerve Center][Cancer][Update] ( Scroll to new posts) ( Auto) 5
30 replies | 1 images | Page ?
[Post a Reply]
[ / / / / / / / / / / / / / ] [ dir / abc / animu / bestemma / fascist / fast / just / omnichan / vichan ][ watchlist ]