[ / / / / / / / / / ] [ dir / aus / britfeel / egy / liberty / newbrit / pawsru / sl / waifuist ]

/hydrus/ - Hydrus Network

Bug reports, feature requests, and other discussion for the hydrus network.

Catalog

Name
Email
Subject
Comment *
File
* = required field[▶ Show post options & limits]
Confused? See the FAQ.
Embed
(replaces files and can be used instead)
Options
Password (For file and post deletion.)

Allowed file types:jpg, jpeg, gif, png, webm, mp4, swf, pdf
Max filesize is 12 MB.
Max image dimensions are 10000 x 10000.
You may upload 5 per post.


New user? Start here ---> http://hydrusnetwork.github.io/hydrus/

Current to-do list has: 829 items

Current big job: finishing and polishing duplicate system


File: e2807513a817bf9⋯.gif (1.94 MB, 266x415, 266:415, e2807513a817bf92d68357b333….gif)

9e840d No.5775

Post your face when 20k duplicates

f9363e No.5779

File: 6c781d178f281c2⋯.gif (283.84 KB, 500x800, 5:8, 6c781d178f281c2e1930dc4f81….gif)

Try 600 duplicates for a 90-pics visual novel CG set that you have to go through manually to make sure a legit duplicate doesn't sneak through.


92a5f6 No.5812

>>5779

Enhancement:

Gathering all those duplicates, re-section them so that only the text part is measured, then run the deduplication process. Problem solved.


427d69 No.5813

>>5812

Are you >>5680 by chance? Consider stopping smoking crack, it's genuinely hard to understand what you're talking about..


92a5f6 No.5814

>>5813

Okay, what I mean is this.

You take all those duplicates reported by the system,

Make hashes with finer sampling than the de-duplication system usually does,

Find out which sampled pixels or sub-arrays have the most variations.

The higher the gini-impurity or f(p)=4p(1-p), the more likely there is text in the CG.

Now, crop the parts of the image as a separate entity, linked to the original image.

Grab those clippings and run the de-duplication system again.

Now you can find which dialog boxes are duplicates and which are not.


92a5f6 No.5815

>>5814

Question: What is the threshold for determining if a sub-array should be cropped for testing?

Take the gini-impurity of each sampled pixels set, and sort them from low to high.

Calculate the difference of the index of the sampled pixel set and the index of its neighbors.

Use the highest difference as a cut-off point to determine if the gini-impurity is high.

Question: How do you crop images so that the majority of the noisy pixels get included in the sub-array?

Lump neighboring noisy sample pixels as a single set. Let's call the pixels in this a "lump pixel".

Measure the amount of lump pixels in each row and column of the aggregation, and create a copy.

Sort both copies from low to high, calculate differences with neighboring cell and create a cut-off point.

Lump neighboring affected rows and columns of the aggregation, and generate the lump array.

Repeat for all lumps, and combine smaller lumps to larger one if the former has majority of its pixels in the latter.

Question: How do you compare the crops of each image if there are more than one crop per image?

There are two different formulas for sorting this out.

1. Sameness by area affected = Sum(Hash sameness of crop * Area of crop)/Sum(Area of crop)

2. sameness by lump difference = Min(Hash sameness of crop)


ac60d3 No.5816

>>5814

CG sets don't have text, you utter autist. They have tons of minor variations of images they cycle through based on the dialogue.


92a5f6 No.5817

>>5816

That is what I said. Dialog texts!


248722 No.5818

>>5817

No, that's not what you said. CG sets often don't have dialog text at all.


f59d35 No.5819

>>5818

Wait, I thought CG is just Graphic Novel Games with dialog texts, preset backgrounds and minor change of facial expressions from characters.

The only time I encountered such a thing is on exhentai, I am just a junior programmer who have some knowledge on how things could be done.


ac60d3 No.5820

>>5819

who the fuck would store the text of a visual novel in images


f59d35 No.5823

>>5820

Everyone? Then explain the words of dialog inside the image.


f601c1 No.5824

File: a5554b6951776ae⋯.jpg (224.2 KB, 960x540, 16:9, 227ce89b3012aa66825fa58cb5….jpg)

File: ce1f1a1b0603eb8⋯.jpg (222 KB, 960x540, 16:9, 7fb87b54b2cb4432da81bf7647….jpg)

>>5823

Go to your favorite booru and search for "game cg". I'll wait.

Basically, if there's text, that's a screencap, and is missing part of the picture.

For instance, these two pics I randomly found. I don't see any dialogue. There wasn't any dialogue in any of the pictures I found.


7c43fc No.5835

>>5824

If its not based on text bubble differences, then it is going to be tough to sort, since the differences can happen in different areas of the picture.

Either increase sampling or implement machine-learning-esque algorithms to find the "Mother" image…

OH BOY


f601c1 No.5839

>>5835

I don't know what was going on in the conversation earlier, but the duplicate search already finds these sorts of pairs easily.


ac60d3 No.5843

>>5839

I don't think OP knows what's going on or what he's talkign about either.


92a5f6 No.5854

>>5839

>>5843

Originally, expecting what I perceived as CG as just images with differences in texts of speech bubbles, kind of like one of those "fill in the blank" templates.

The solution for that case is finding hash variance in specific areas where the text or speech bubble is, and isolate the area and re-run the hash for easier comparing.

Now, with the speech bubbles out of the way, things get more complicated, the basic solution is to increase hash length i.e. more fine-grained measurement of minor differences, and have a UI that emphasize on areas effected for easier comparing. This can only work for pairs of images.

P.S. I am not OP of this thread.


9e840d No.5860

>>5843

what the fuck you want?


ff8521 No.5877

File: df50c3df9dbc0a1⋯.jpg (17.72 KB, 235x174, 235:174, copy all tags.jpg)

Is there a way to copy all the tags and paste them into an untagged duplicate? It doesn't seem to work from this menu, unless I'm retarded and doing it wrong or missing something.


f601c1 No.5878

>>5877

Use the "paste tags" button? If you're trying to ctrl+v into the text box it won't work.


ff8521 No.5882

>>5878

Oh right found it thanks.




[Return][Go to top][Catalog][Post a Reply]
Delete Post [ ]
[]
[ / / / / / / / / / ] [ dir / aus / britfeel / egy / liberty / newbrit / pawsru / sl / waifuist ]