[ / / / / / / / / / ] [ dir / cute / egy / fur / kind / kpop / miku / waifuist / wooo ]

/hydrus/ - Hydrus Network

Bug reports, feature requests, and other discussion for the hydrus network.

Catalog

Name
Email
Subject
Comment *
File
* = required field[▶ Show post options & limits]
Confused? See the FAQ.
Embed
(replaces files and can be used instead)
Options
Password (For file and post deletion.)

Allowed file types:jpg, jpeg, gif, png, webm, mp4, swf, pdf
Max filesize is 12 MB.
Max image dimensions are 10000 x 10000.
You may upload 5 per post.


New user? Start here ---> http://hydrusnetwork.github.io/hydrus/

Current to-do list has: 714 items

Current big job: finishing off duplicate search/filtering workflow


File: 1469302448334.jpg (66.25 KB, 500x585, 100:117, naughty02a.jpg)

b01640 No.3247

http://observer.com/2016/06/mediachain/

It seems like they're building a database of hashes for artwork so it can be identified even if it's been cropped, scaled, watermarked, recompressed or otherwise changed.

>A new company operating out of a remote Brooklyn warehouse aims to make it easy to know who made something, even if the part shared on Tumblr or Facebook was only cropped out of a larger work. Mediachain is building a means a system that can identify creative work (visual, musical and literary) around the web and easily display the metadata from its inception. If it all works out, the anarchic distribution of creative work across social media will turn those posts into a vector for discovering the creators behind the work.

>The team takes a bit of a nod from BitTorrent. That network created a unique hash for each piece of media, which Mediachain co-founder Denis Nazarov referred to as “content addressing.” Each exact copy of that content could be found with its unique hash, an ID made from the precise bits and bytes of the original file.

>But if the goal is to maintain attribution, the problem with this approach is that files change as they propagate online. “You can’t rely on the same exact hashing functions that bitcoin or BitTorrent relies on,” co-founder Jesse Walden said during our conversation in Mediachain’s offices. “Mediachain relies on technology like Shazam or Google Image Search.”

>So Mediachain retains the insight of distributed data from BitTorrent, but updates the means of finding it. Shazam is the software that can recognize music by listening through your microphone. The basic idea behind it can be applied to visual content, as well. Mediachain takes that signature and turns it into their own hash, to make a work findable. Nazarov called their approach “concept addressing.” This way, Mediachain should be able to recognize a work, even when it has changed a little bit.

The hashes are then stored in IPFS. Might want to keep an eye on this project.

8035f8 No.3253

File: 1469390372183.jpg (39.72 KB, 420x582, 70:97, 9f4a56cd23aa8ebbfc291de192….jpg)

This is interesting and neat!

It looks like they use a DCT for their phash (perceptual hash), or at least as one building block of their system, which is how hydrus does it atm. You basically describe an image as a series of common true/false waveforms and then compare by how many waveforms two phashes differ.

I'll be working on faster search for this in the near future (grouping large amounts of media by phash 'hamming' distance is breddy complicaed), and I'd like eventually to add sub-phash checking for character faces or other 'interesting' areas of an image.


fff3aa No.3278

>>3253

There are a lot of phash libraries out there, some have been mentioned on this board before. Better dupe detection is always nice.




[Return][Go to top][Catalog][Post a Reply]
Delete Post [ ]
[]
[ / / / / / / / / / ] [ dir / cute / egy / fur / kind / kpop / miku / waifuist / wooo ]