>>5814
Question: What is the threshold for determining if a sub-array should be cropped for testing?
Take the gini-impurity of each sampled pixels set, and sort them from low to high.
Calculate the difference of the index of the sampled pixel set and the index of its neighbors.
Use the highest difference as a cut-off point to determine if the gini-impurity is high.
Question: How do you crop images so that the majority of the noisy pixels get included in the sub-array?
Lump neighboring noisy sample pixels as a single set. Let's call the pixels in this a "lump pixel".
Measure the amount of lump pixels in each row and column of the aggregation, and create a copy.
Sort both copies from low to high, calculate differences with neighboring cell and create a cut-off point.
Lump neighboring affected rows and columns of the aggregation, and generate the lump array.
Repeat for all lumps, and combine smaller lumps to larger one if the former has majority of its pixels in the latter.
Question: How do you compare the crops of each image if there are more than one crop per image?
There are two different formulas for sorting this out.
1. Sameness by area affected = Sum(Hash sameness of crop * Area of crop)/Sum(Area of crop)
2. sameness by lump difference = Min(Hash sameness of crop)