Version 241

Name
Email
Subject
Comment *
File	Select/drop/paste files here
* = required field	[▶ Show post options & limits] Confused? See the FAQ.

Embed	(replaces files and can be used instead)
Options	Do not bump (you can also write sage in the email field) Spoiler images (this replaces the thumbnails of your images with question marks)
Password	(For file and post deletion.)
Allowed file types:jpg, jpeg, gif, png, webm, mp4, swf, pdf Max filesize is 12 MB. Max image dimensions are 10000 x 10000. You may upload 5 per post.

YouTube embed. Click thumbnail to play.

Version 241 hydrus_dev ## Board Owner 01/19/17 (Thu) 00:15:17 324857 No.4950

windows

zip: https://github.com/hydrusnetwork/hydrus/releases/download/v241/Hydrus.Network.241.-.Windows.-.Extract.only.zip

exe: https://github.com/hydrusnetwork/hydrus/releases/download/v241/Hydrus.Network.241.-.Windows.-.Installer.exe

os x

app: https://github.com/hydrusnetwork/hydrus/releases/download/v241/Hydrus.Network.241.-.OS.X.-.App.dmg

tar.gz: https://github.com/hydrusnetwork/hydrus/releases/download/v241/Hydrus.Network.241.-.OS.X.-.Extract.only.tar.gz

linux

tar.gz: https://github.com/hydrusnetwork/hydrus/releases/download/v241/Hydrus.Network.241.-.Linux.-.Executable.tar.gz

source

tar.gz: https://github.com/hydrusnetwork/hydrus/archive/v241.tar.gz

I had a good week. I fixed things and moved the duplicate search stuff way forward.

fixes and a note on cv

I've fixed the stupid 'add' subscription bug I accidentally introduced last week. I apologise again–I have added a specific weekly test to make sure it doesn't happen again.

With the help of some users, I've also updated the clientside pixiv login for their new login system. It seems to work ok for now, but if they alter their system any more I'll have to go back to it. Ideally, I'd like to write a whole login engine for the client to allow login for any site and make pixiv and anything else work with less duct tape and more easily maintainable.

For Windows users, I've updated the client's main image library (OpenCV) this week, and this new version looks to be more stable (it loads some files that crashed the old version). If you are on Windows and have 'load images with PIL' checked under options->media, I recommend you now turn it off–if you have a decent graphics card, your images will load about twice as fast.

duplicate files are now findable

Dupe file display or filtering is not yet here. If you are interested in this stuff, then please check it out and let me know how you get on, but if you are waiting for something more fun than some numbers slowly getting bigger, please hang in there for a little longer!

I have written code to auto-find duplicate pairs and activated the buttons on the new duplicates page (which is still at pages->new search page->duplicates for now).

The idea of this page is to:

1) Prepare the database to search for duplicate pairs.

2) Search for duplicate pairs at different confidence levels (and cache the results).

3) Show those pairs one at a time and judge what kind of dupe they are.

Parts 1 and 2 now work. I would appreciate, if you are interested, in you putting some time into them and giving me some numbers so I can design part 3 well.

Since originally introducing duplicate search, I have updated the 'phash' algorithm (which represents how an image 'looks' for easy comparison) several times. I improved it significantly more this week and am now pleased with it, so I do not expect to do any more on it. As all existing phashes are low quality, I have scheduled every single eligible file (jpgs and pngs) for phash regeneration. This is a big job–for me, this is about 250k files that need to be completely read again and have some CPU thrown at them. I'm getting about 1-2 thousand per minute, so I'm expecting to be at it for something like three hours. This only has to be done once, and only for your old files–new files will be introduced to the new system with correct phashes as they are imported.

To save redundant tree rebalancing, I recommend you set the time aside and regenerate them all in one go. The db will be locked while it runs. The maintenance code here is still ugly and may hang your gui. If it does hang, just leave it running–it'll get there in the end.

Then, once the 'preparation' panel is happy, run some searches at different distances–you don't have to search everything, but maybe do a few thousand and write down the rough number of files searched and duplicate pairs discovered.

I am very interested to know:

- How inconvenient was it doing the regen in real time? Approximately how fast did it run?

- At 'exact match' search distance, roughly how many potential duplicate pairs per thousand files does it find? What about 'very similar' and (if it isn't too slow) 'similar'?

- How much of this heavy CPU/HDD work would you like to run in the background on the normal idle routines?

- Did anything go wrong?

I'm still regenerating files as I write this, but I will update with my own numbers once I can. Thanks!

full list

- fixed the 'setnondupename' problem that was affecting 'add' actions on manage subscriptions, scripts, and import/export folders

- added some more tests to catch this problem automatically in future

- cleaned up some similar files phash regeneration logic

- cleaned up similar files maintenance code to deal with the new duplicates page

- wrote a similar files duplicate pair search maintenance routine

- activated file phash regen button on the new duplicates page

- activated branch rebalancing button on the new duplicates page

- activated duplicate search button on the new duplicates page

- search distance on the new duplicates page is now remembered between sessions

- improved the phash algorithm to use median instead of mean–it now gives fewer apparent false positives and negatives, but I think it may also be stricter in general

- the duplicate system now discards phashes for blank, flat colour images (this will be more useful when I reintroduce dupe checking for animations, which often start with a black frame)

- misc phash code cleanup

- all local jpegs and pngs will be scheduled for phash regeneration on update as their current phashes are legacies of several older versions of the algorithm

- debuted a cog menu button on the new duplicates page to refresh the page and reset found potential duplicate pairs–this cog should be making appearances elsewhere to add settings and reduce excess buttons

- improved some search logic that was refreshing too much info on an 'include current/pending tags' button press

- fixed pixiv login–for now!

- system:dimensions now catches an enter key event and passes it to the correct ok button, rather than always num_pixels

- fixed some bad http->https conversion when uploading files to file repo

- folder deletion will try to deal better with read-only nested files

- tag parent uploads will now go one at a time (rather than up to 100 as before) to reduce commit lag

- updated to python 2.7.13 for windows

- updated to OpenCV 3.2 for windows–this new version does not crash with the same files that 3.1 does, so I recommend windows users turn off 'load images with pil' under options->media if they have it set

- I think I improved some unicode error handling

- added LICENSE_PATH and harmonised various instances of default db dir creation to DEFAULT_DB_DIR, both in HydrusConstants

- misc code cleanup and bitmap button cleanup

next week

I'm going to collect my different thoughts on how to filter duplicate pairs into a reasonable and pragmatic plan and finally get this show on the road. I do not think I will have a working workflow done in one week, but I'd like to have something to show off–maybe displaying pairs at the least, so we can see well how the whole system is working at different distances.

Anonymous 01/19/17 (Thu) 01:41:03 a5e046 No.4952

>>4950

You bet I'll play around with the dupe search

> How inconvenient was it doing the regen in real time? Approximately how fast did it run?

Not at all inconvenient, my ~62k files took about 30 minutes total on a cheap SSD and did not impact performance enough to prevent use of the computer. Search tree rebalancing took about two minutes for 55k entries.

> At 'exact match' search distance, roughly how many potential duplicate pairs per thousand files does it find? What about 'very similar' and (if it isn't too slow) 'similar'?

total files: 62119

exact matches: 2317

very similar matches: 4221

similar will take a while

> How much of this heavy CPU/HDD work would you like to run in the background on the normal idle routines?

Personally, I'd rather it be run only on import and manually for the entire DB. If I'm just using the client for a quick lookup I likely won't be needing duplicate info at all, and when I'm doing the actual sorting, dupe cleaning and tag input I'll have time set aside to begin with, so calculating an additional x minutes for running an update to the duplicate search won't be much of a problem.

This way I can also just keep the client open without having to worry about it impacting the performance of other processes at inopportune times. But as long as there's a way to disable automatic processing on idle, I'll be happy

> Did anything go wrong?

Nothing big. The client froze two or three times during exact match discovery, but never for longer than a few seconds. The process indication for rebalancing metadata was stuck at zero/55k for about a minute, before jumping to 30k files done and staying there until finished. I'm guessing the underlying process won't allow for smaller progress updates?

Anonymous 01/19/17 (Thu) 11:19:42 a5e046 No.4956

>>4952

Update for similar: 5361

Anonymous 01/19/17 (Thu) 16:05:59 cd42fd No.4958

>>4950

>How inconvenient was it doing the regen in real time? Approximately how fast did it run?

The regen took 49 minutes to run for 35,155 total files. Disk utilization (for a 7200 RPM hard drive) maxed at 30% during that time. 30317 branches had to be rebalanced after this, which took 2 minutes.

>At 'exact match' search distance, roughly how many potential duplicate pairs per thousand files does it find? What about 'very similar' and (if it isn't too slow) 'similar'?

Exact match found 3671, took 3 minutes.

Very similar found 4496, took 6 minutes.

Similar found 4766, took 31 minutes.

>How much of this heavy CPU/HDD work would you like to run in the background on the normal idle routines?

CPU / Disk usage wasn't really noticable. I was able to use my computer just fine during all of it. So having it run in the foreground is fine with me.

>Did anything go wrong?

The number of potential duplicates it found, especially for exact match, seems very high. I can't imagine my database has more than a dozen exact duplicates as I have no subscriptions and manually review all the files I use hydrus to download. Also, the potential duplicates counter does not reset to 0 between searches.

Anonymous 01/19/17 (Thu) 18:19:34 6cdabc No.4960

Hey dev,

For users who keep their database outside the install dir, choosing "launch Hydrus" from the installer will open the default db instead of the preferred db after install. Just fyi.

Anonymous 01/19/17 (Thu) 19:54:48 93abbf No.4961

File: ccb77222054cc45⋯.jpg (393.5 KB, 1680x1050, 8:5, ccb77222054cc45d81b6b6b41a….jpg)

Technical data: 36,5k files in archive and inbox, HDD, Windows 8.1, core i3, 4gb Ram

> How inconvenient was it doing the regen in real time? Approximately how fast did it run?

From the database/maintain menu, it took 2 minutes to regenerate similar file search data, the first thousand files starts slow, maybe 5 or 10, but then it starts leaping by 1k until it finishes. However, I then did the regen through the duplicates window and that took much longer, like an hour maybe. It regenerated the file metadata of 34.709 files and rebalanced 30k files in less than a minute.

> At 'exact match' search distance, roughly how many potential duplicate pairs per thousand files does it find? What about 'very similar' and (if it isn't too slow) 'similar'?

For 34.709 files

exact match: 1040 (3 minutes)

very similar: 1292 (6 minutes)

similar: 1563 (37 minutes)

> How much of this heavy CPU/HDD work would you like to run in the background on the normal idle routines?

That depends on how heavy it is. I like to do everything manually, but if it doesn't need to regenerate the entire database everytime I add 1k files, then a little bit of maintenance doesn't sound so bad.

> Did anything go wrong?

There were a couple of instances while hydrus regenerated the file metadata that it was CATASTROPHIC, it went from 200 mb of ram use to almost 3gb, mouse, touchpad and keyboard were unresponsive. It happened even with only hydrus running. These hang-ups lasted a minute or so. If you want, I can help you debug it, just tell me which option should I activate and I'll paste the output when it happens.

As always, thanks for your hard work.

Anonymous 01/19/17 (Thu) 21:17:54 342fd0 No.4962

Computer: AMD Phenom II 3 core 3ghz, 12gb ram

49567 files in db, on an old 5400rpm laptop HDD

>How inconvenient was it doing the regen in real time? Approximately how fast did it run?

About 2h10m

>At 'exact match' search distance, roughly how many potential duplicate pairs per thousand files does it find? What about 'very similar' and (if it isn't too slow) 'similar'?

exact: 3409, 6m

very similar: 6761, 13m

similar: 9211, 1h12m

>How much of this heavy CPU/HDD work would you like to run in the background on the normal idle routines?

It's maxing one core when searching for dupes, less when doing the regen. No noticeable slowdown in the rest of the system. I don't mind if it runs during idle.

Anonymous 01/20/17 (Fri) 00:28:15 96a06c No.4963

>>4950

AMD 5350 running Gentoo.

>How inconvenient was it doing the regen in real time?

Outside of Hydrus there is no real problem, Linux schedules it fine.

Inside Hydrus it causes less GUI lockups or such than imports did (haven't tried these on the current release yet).

It's not really intrusive, just slow.

> Approximately how fast did it run?

~10 files/second for phash creation.

~1.5 branches/second for tree rebalance.

~4 files/second for exact search, but IDK if that's affected by me not creating a complete search tree.

> At 'exact match' search distance, roughly how many potential duplicate pairs per thousand files does it find?

About 30 per 1000 on the early search

Anonymous 01/20/17 (Fri) 03:35:56 0f6ec1 No.4965

Take my user experience, onee-chan!

The Machine:

OS - Windows 7

Processor - 3Ghz dual-core AMD

Memory - 8GB, ~4 free (about 3 used by system and misc processes, about 0.9 used by hydrus client when open and idle)

Hydrus running inside a multi-TB Veracrypt partition on a USB3.0 external drive, but AES-only encrypted so it performs pretty well.

The DB:

# of files - 99500

# in inbox / # in archive (probably doesn't matter) - 86000 / 13500

regenerate -> similar files search data:

I paused subscriptions and made sure all of my tabs were fully downloaded before running this, even though it should lock the db, just to be safe.

I have never compared files for similarity before so unless it did it in the background or on a closing update, this is the first time the similarity tree has been made for my db

82162 leaves found

CPU use jumped from basically nothing to around 55%, a little choppy but smoother than I expected honestly

Didn't use much RAM

GUI hang/(Not Responding) - GUI remained smooth switching tabs and opening menus, trying to open the settings panel froze it hard though, I force-closed it and restarted the process because I wanted to watch the progress

Seemed to do one or two per second at first

Smoothly accelerated to ~5 per second around 500 branches

Seemed to continue getting slightly faster as more were finished, as would stand to reason given a shrinking pool of comparisons

By 1500, generating ~20 per second

1400, ~100

10000, ~1000

50000, ~10000

Started regen - 12:42

Finished regen - 12:48

6 minutes for 100k files.

For everything below this, I was running some games and programs off and on during it, which seemed to slow it down a bit, so it's probably somewhat faster when left alone.

duplicates "preparation":

88114 eligible files, 0 up to date.

"Search tree is fast!"

It started faster than the above, ~8 per second

Seems to fluctuate somewhere between 5 and 10 actions per second

Hung up a few times near 2000 but didn't hang GUI

RAM usage increasing in nice gradual line, was 4 GB at idle, up to 5.5 GB by 2500

Got up to 6GB by 5000

CPU fluctuating between 40%-60% mostly

Occasionally the client gets stuck on a file for a few seconds and CPU drops to nothing, but it always starts again

Sometimes slows down to about 1 per second, not sure if it's because of the other programs running or just hydrus performance issues or larger images

Took 4 hours, 3 minutes

"All 88114 eligible files up to date! 77027 search branches to rebalance"

Rebalance (again, but through duplicate tab this time):

As the other poster said, doesn't seem to update the counter when I do it through this screen.

Took 8 minutes

Exact match search distance:

88117 files

13595 potential duplicates found

This seems about right, I've been downloading mostly monothematic content from various more-or-less interlinked Tumblr accounts, so cropped/resized/reformatted/etc dupes are to be expected…and at least one of the blogs I'm subscribed to got migrated and the images reformatted like you talked about. The one I noticed was one I had to redownload 13000+ files from, which is what most of these are, I'm sure.

Uses up to 6GB (extra 2GB) of memory and soft-maxes CPU

gets hung up on a file for a little while sometimes

Took 39 minutes

Very similar:

21702 potential duplicates found

Took 1 hour, 20 minutes

Haven't tried the other two yet, may try on 1000 or so files.

>How much of this heavy CPU/HDD work would you like to run in the background on the normal idle routines?

As much as possible, really, but my experience with idle stuff is that it tends to jam up the OS a bit, so maybe you could make it optional and give us a setting for the percent of system resources it could use, and default it to something slow but lightweight like 20%?

>Did anything go wrong?

Trying to open the options panel froze on of the processes so I restarted it. I figured it would. Switching between tabs and looking at menus doesn't seem to cause problems, though.

Anonymous 01/20/17 (Fri) 04:38:42 0f6ec1 No.4966

This just popped up, wasn't doing anything with dupes, just looking through tabs to prune dead ones and downloading from tumblr subscriptions:

PyAssertionError
C++ assertion "(itemid >= 0 && itemid < SHRT_MAX) || (itemid >= wxID_AUTO_LOWEST && itemid <= wxID_AUTO_HIGHEST)" failed at ..\..\src\common\menucmn.cpp(260) in wxMenuItemBase::wxMenuItemBase(): invalid itemid value
  File "include\ClientGUI.py", line 2740, in EventNotebookMenu
    menu.Append( ClientCaches.MENU_EVENT_ID_TO_ACTION_CACHE.GetTemporaryId( 'tab_menu_close_page' ), 'close page' )
  File "site-packages\wx-3.0-msw\wx\_core.py", line 12007, in Append

It was while I was closing a dead tab of 4chan /wg/. Doesn't seem to have actually had any "real" error, looks like just a noisy warning of some kind, but I thought I'd let you know.

Anonymous 01/20/17 (Fri) 04:39:51 0f6ec1 No.4967

>>4966

I take that back, it recurs whenever I try to open the right-click menu for that tab, so I can't close the tab. Can still just navigate around the tab though.

Anonymous 01/20/17 (Fri) 04:41:06 0f6ec1 No.4968

It happened on another dead tab too, this one from /s/. Not sure if a bug with this release or if my install just has an error.

Anonymous 01/20/17 (Fri) 04:52:49 f99de4 No.4969

File: df18a76a6e612ed⋯.png (1.43 MB, 10000x9052, 2500:2263, files.png)

26,854 eligible files out of 28,514

>How inconvenient was it doing the regen in real time? Approximately how fast did it run?

Not very; hydrus only used up 5-12% of my cpu for 37 minutes while fighting for disk I/O with the other programs I was running.

>At 'exact match' search distance, roughly how many potential duplicate pairs per thousand files does it find? What about 'very similar' and (if it isn't too slow) 'similar'?

1,911 for exact match with 26,854 files took about a minute's time

2,681 for very similar about 3 minutes

3,574 for similar about about 17 minutes

>How much of this heavy CPU/HDD work would you like to run in the background on the normal idle routines?

I didn't really notice much performance problems while I let it sit in the background; if this information was generated while the client was idle, I wouldn’t mind.

>Did anything go wrong?

Nope.

Also, It's been nearly 2 years (exactly on the 21st) since I started using hydrus, so to celebrate the occasion I created this graph showing the growth of my collection's files over time. Hydrus really makes it easy to start amassing a large amount of files because the tagging system makes organizing and searching for specific things so much more efficient that having so many files isn’t overwhelming in the slightest. It’s surprising actually, to see that I’ve handpicked nearly 29,000 files in such a short time span.

Anonymous 01/20/17 (Fri) 04:54:18 572a83 No.4970

Have you considered making a flatpak?

Anonymous 01/20/17 (Fri) 05:19:38 0f6ec1 No.4971

>>4966

>>4967

>>4968

Disregard these, found solution: don't leave Hydrus running for a day and a half and then do stuff without restarting it.

Anonymous 01/20/17 (Fri) 09:32:28 342fd0 No.4972

>>4962

I should add that this HDD is AES encrypted as well

Anonymous 01/20/17 (Fri) 09:40:34 342fd0 No.4973

Can this duplicate search method detect mirroring or cropping? I'm guessing no.

hydrus_dev ## Board Owner 01/20/17 (Fri) 19:57:53 324857 No.4974

File: b89a492c5a1b347⋯.webm (3.64 MB, 640x480, 4:3, b89a492c5a1b34795d19820ae….webm)

Hey folks, thank you for all these responses. I misread my numbers the first time around, but after doing the maintenance and a round of searching, I have:

290k eligible files, processed at about 1,500/min.

Exact search found 27.3k potential dupes, processed a little faster than that.

I haven't run a full 'very similar' run yet as it looks like it will take six hours or so. I may wait until I've added the search job into the normal idle routine.

My feelings are:

1) I'm ok with the speed of maintenance and searching. It isn't super fast, but it isn't impossible either. There are some search optimisations I can add, and I will look at folding more of the heavy lifting into the normal idle routines, just in little chunks, and adding more user options on how to run these jobs (I'll make the 'play' buttons on the new page into menu buttons with several 'run for 10 mins'-like options and try to clean up some of the hang).

2) The number of potential dupes we are finding, if they aren't full of false positives, are very much on the high end of my expectation. It seems like most of us are looking at 5-15% 'these files are the same' and maybe another 5-10% alternates. I guessed 0.1-1%, which suggests this dupe stuff is needed all the more. Having a fast workflow to collapse the dupes will be important.

>>4960

Thanks–I hadn't thought of that. The external database stuff never got polished–maybe I should add a way of associating that external db as the default through a registry entry/home directory or something, for users who would like to 'install' that information.

>>4961

I'm not sure what caused that ram inflation, as none of this stuff should need much. Unless you have some very unusual files that cause the problem, I'm guessing python's garbage collector was getting choked. I will see if I can tell the maintenance routine to calm down a little.

I hope to have this problem fixed completely in the coming weeks! I apologise for the inconvenience. I am slowly replacing old menu code that sometimes runs out of new ids. As you found, the solution for now is to restart.

>>4969

Congrats! That graph is neat.

>>4970

I'm sorry to say I'd never heard of it, so no! I don't have a lot of experience with Linux packaging, so I probably don't have the expertise to figure it out without doing it all wrong. If you are interested and know all about it, feel free to create your own flatpak.

>>4973

This first system uses phash, which works on shape. It can do resizes, recolours, and different image quality extremely well, but won't catch mirrors or crops unless the important differences of the image remain in basically the same place. It can often catch demotivationals, for instance.

That said, almost all of the work I am doing here is not phash related, but search and maintenance and gui code. Once we have this first version going, I can eventually add different matching systems to it, like mirrors or cropped phashes of the most 'interesting' parts of an image, including clever things like faces. I'd also like to add some colour searching/matching code at some point, so you can say 'show me all the images with pepe green in them'.

Anonymous 01/21/17 (Sat) 19:29:47 89b6b4 No.4980

Due to an error on my part, hydrus couldn't find the database anymore, and the only solution I could think of without having to mess with configurations, etc. was to re-import the whole db, but it also pulled all the thumbnails that were generated since I first run the program

I'm already weeding them out by searching for duplicates, but it's rather slow and tedious

Will those new dupe search features help me making this process faster?

I don't really understand what they do, so before messing with them I'd rather ask

Anonymous 01/21/17 (Sat) 19:57:15 f1caab No.4981

File: a88cf529e276f81⋯.png (103.19 KB, 812x608, 203:152, Untitled.png)

Got some sort of memory leak whilst doing a booru gallery download. It was only a little over 500 files so nothing out of the ordinary but the memory usage kept spiking and hit around 4GB before I killed it (wouldn't shutdown on its own).

I guess I'll see if I can trigger it again with the profiler on?

Anonymous 01/21/17 (Sat) 21:46:21 f1caab No.4982

>>4981

Well my log is huge now: https://my.mixtape.moe/vqkoau.log

I question if these final entires may have something to do with it


2017/01/21 15:28:22: Profiling read load_into_disk_cache
2017/01/21 15:28:22: Stats

         4248 function calls in 0.247 seconds
2017/01/21 15:28:22: Profiling write maintain_similar_files_tree
2017/01/21 15:28:22: Stats

         95 function calls in 0.001 seconds
2017/01/21 15:28:22: Profiling write vacuum
2017/01/21 15:28:22: Stats

         42 function calls in 0.001 seconds
2017/01/21 15:28:22: Profiling write analyze
2017/01/21 15:28:22: Stats

         414 function calls in 0.013 seconds

There's other bits in between them, but calling all of those operations at the same time, during a gallery download is odd to say the least.

Anonymous 01/22/17 (Sun) 08:48:13 0f6ec1 No.4989

Are there any plans to add an archive indicator to 4chan download tabs? It would be useful to clear out clutter since many boards have threads that stay in the archive for a while before truly 404ing, and as far as I can tell they still show up in Hydrus just like normal threads do, don't go to 0 reps or anything. It would also be useful to have an indicator to know it might be time to search for the next iteration of "perennial threads".

Anonymous 01/22/17 (Sun) 14:57:35 96a06c No.4990

>>4980

I think it was indicated in this thread that this would probably work eventually. It would still be a slow method.

How about you just delete all images with the dimensions of a thumbnail instead? Should be way faster.

Use system:dimensions in the search.

Anonymous 01/24/17 (Tue) 16:02:04 029869 No.5001

>>4990

I also have small images that are thumbnail-sized for various reasons, doing a search by dimesion would mean I can accidentally delete them too

Doing a dupe search is better as often the thumbnails have worse quality for the same dimensions

Anyway, it seems right now I still have to go with what I was doing now, but that's ok

hydrus_dev ## Board Owner 01/25/17 (Wed) 00:53:38 324857 No.5007

File: 1da69669e80ce73⋯.png (54.03 KB, 619x333, 619:333, the gununus are merging.png)

>>4980

>>4990

>>5001

The new dupe system will find all your imported thumbs automatically. I've just got it doing some prototype display for tomorrow's release, and it is working great. Processing them will obviously take time, but I am hoping to have a smooth workflow. This stuff is not yet user friendly, but I would appreciate your feedback as this stuff rolls out.

>>4981

>>4982

Thank you for this report. You can delete that huge log whenever is convenient, whenever the client is closed. Some people have reported memory explosion with the new similar files jobs. I am not sure what is causing it, and have been unable to reproduce it on my end. Perhaps you are seeing something similar, although nothing in your log looks like it was heavy enough to be the culprit.

Those vacuum etc… calls are the idle time maintenance jobs–I assume you left your machine to itself while it was doing this, so they were kicking in every five minutes or so, just to check if any idle work needs doing. As there is no work to do, they are only taking a handful of milliseconds, so I don't think they are the cause of your problem.

Are you on as SSD? Is your computer more powerful than average? What sort of booru images were you downloading–were they unusually weighty, like 5000x5000 pngs?

As you got this while importing and other people got it while doing some image stuff with duplicate files, I suspect image processing data is not being promptly deleted. When this next happens, please go help->debug->print garbage, which will print a list of data objects to your log that may let us know more. That call may also explicitly clear up some memory, so please watch in task manager as you do it and let me know how you get on.

>>4989

I should think the new downloader engine, which I will be working on after the current dupe stuff, will support this!

Anonymous 01/25/17 (Wed) 21:09:56 f1caab No.5020

>>5007

>>4981

>>4982

Nope, HDD. The files weren't particularly huge either, just an average scrape from Konachan and Yande. I doubt I could get to the print garbage option since the db locks up entirely when this happens, but I guess I can give it a shot. This is at least repeatable on my end, since for my last 3 download attempts I've gotten it every time.

Anonymous 02/03/17 (Fri) 08:39:39 3f9b53 No.5082

>>5020

>>5007

Tried to print garbage and this is pretty much all it managed to do before I had to kill it.


2017/02/03 02:15:15: hydrus client started
2017/02/03 02:15:15: booting controller...
2017/02/03 02:15:15: booting db...
2017/02/03 02:15:15: preparing disk cache
2017/02/03 02:15:16: preparing db caches
2017/02/03 02:15:17: booting gui...
2017/02/03 02:21:01: C:\HYDRUS~1\PIL\Image.py:2244: DecompressionBombWarning: Image size (225829593 pixels) exceeds limit of 89478485 pixels, could be decompression bomb DOS attack.

The download was running fine for the most part, with the client using between ~260-~300MB of memory; it would occasionally inflate to 400, 600, etc. for a second or two. I managed to catch it inflate to 1.5GB but it also went away after a second, so even though I hit print garbage I don't think it caught anything there.


2017/02/03 02:00:01: Printing garbage to log
2017/02/03 02:00:01: gc types:
2017/02/03 02:00:01: (<type 'method_descriptor'>, 2263)
2017/02/03 02:00:01: (<type 'member_descriptor'>, 522)
2017/02/03 02:00:01: (<class 'pyparsing.And'>, 117)
2017/02/03 02:00:01: (<type 'getset_descriptor'>, 2851)
2017/02/03 02:00:01: (<type 'wrapper_descriptor'>, 3414)
2017/02/03 02:00:01: (<type 'instance'>, 1728)
2017/02/03 02:00:01: (<type 'set'>, 2695)
2017/02/03 02:00:01: (<type 'property'>, 2452)
2017/02/03 02:00:01: (<type 'StgDict'>, 361)
2017/02/03 02:00:01: (<type '_ctypes.CField'>, 266)
2017/02/03 02:00:01: (<type 'classobj'>, 642)
2017/02/03 02:00:01: (<type 'module'>, 1402)
2017/02/03 02:00:01: (<type 'type'>, 3342)
2017/02/03 02:00:01: (<type 'builtin_function_or_method'>, 11574)
2017/02/03 02:00:01: (<class 'comtypes.dispid'>, 118)
2017/02/03 02:00:01: (<type 'instancemethod'>, 1126)
2017/02/03 02:00:01: (<class 'include.ClientRatings.RatingsManager'>, 111)
2017/02/03 02:00:01: (<type 'cell'>, 1651)
2017/02/03 02:00:01: (<class 'include.ClientMedia.LocationsManager'>, 111)
2017/02/03 02:00:01: (<class 'zope.interface.interface.InterfaceClass'>, 117)
2017/02/03 02:00:01: (<type 'weakref'>, 6163)
2017/02/03 02:00:01: (<type 'operator.itemgetter'>, 245)
2017/02/03 02:00:01: (<class 'zope.interface.interface.Method'>, 312)
2017/02/03 02:00:01: (<type '_ctypes.PyCFuncPtrType'>, 139)
2017/02/03 02:00:01: (<type 'collections.defaultdict'>, 404)
2017/02/03 02:00:01: (<type '_cffi_backend.CTypeDescr'>, 306)
2017/02/03 02:00:01: (<class 'zope.interface.declarations.Implements'>, 129)
2017/02/03 02:00:01: (<type 'list'>, 6695)
2017/02/03 02:00:01: (<class 'include.ClientMedia.MediaResult'>, 111)
2017/02/03 02:00:01: (<class 'bs4.element.NavigableString'>, 779)
2017/02/03 02:00:01: (<type 'tuple'>, 12633)
2017/02/03 02:00:01: (<class 'wx._core.PyEventBinder'>, 271)
2017/02/03 02:00:01: (<class 'zope.interface.declarations.ClassProvides'>, 122)
2017/02/03 02:00:01: (<class 'include.ClientMedia.TagsManager'>, 111)
2017/02/03 02:00:01: (<type 'function'>, 27928)
2017/02/03 02:00:01: (<type 'classmethod'>, 240)
2017/02/03 02:00:01: (<class '_weakrefset.WeakSet'>, 316)
2017/02/03 02:00:01: (<type 'staticmethod'>, 606)
2017/02/03 02:00:01: (<class 'wx._core.MenuItem'>, 141)
2017/02/03 02:00:01: (<class 'bs4.element.Tag'>, 532)
2017/02/03 02:00:01: (<type 'frame'>, 119)
2017/02/03 02:00:01: (<type 'dict'>, 14138)
2017/02/03 02:00:01: (<class 'include.ClientGUIMedia.ThumbnailMediaSingleton'>, 111)
2017/02/03 02:00:01: gc classes:
2017/02/03 02:00:01: ('ValueSizeConstraint', 105)
2017/02/03 02:00:01: ('__new__', 409)
2017/02/03 02:00:01: ('ConstraintsIntersection', 112)
2017/02/03 02:00:01: ('WeakKeyDictionary', 373)
2017/02/03 02:00:01: ('NamedType', 141)
2017/02/03 02:00:01: ('NamedTypes', 118)
2017/02/03 02:00:01: ('from_param', 313)
2017/02/03 02:00:01: uncollectable garbage: []

After that I closed and restarted the client so it would be fresh and about 5 minutes into the download it spiked from 250-ish MB to 3.5GB. The client locked up but before I could force close, it managed to get itself under control somehow; the memory dropped to 180MB and it was laggy but that's when I managed to get the print garbage from up top. Unfortunately it started rapidly inflating memory usage (400MB->900MB->1.5GB) so I killed it and just kind of hoped it managed to print garbage.

And yes, I did update to this week's release. Haven't had any other problems except this gallery download issue.

hydrus_dev ## Board Owner 02/07/17 (Tue) 23:21:11 420ba9 No.5104

>>5020

>>5082

Thank you. I will have another think about this.

Anonymous 03/03/17 (Fri) 01:05:01 0356d8 No.5259

>>5104

I think I've found the image that's causing the issue, atleast on konachan: https://konachan.com/post/show/161317/bicolored_eyes-black_hair-bow-cosplay-gokou_ruri-l

The full png doesn't load properly in chrome, and even after downloading, mspaint doesn't want to open it either. And sure enough, the gallery download commences smoothly if I skip that file!

Yande.re doesn't have it though, so their must be some other large file tripping up the download there.