windows
zip: https://github.com/hydrusnetwork/hydrus/releases/download/v358/Hydrus.Network.358.-.Windows.-.Extract.only.zip
exe: https://github.com/hydrusnetwork/hydrus/releases/download/v358/Hydrus.Network.358.-.Windows.-.Installer.exe
os x
app: https://github.com/hydrusnetwork/hydrus/releases/download/v358/Hydrus.Network.358.-.OS.X.-.App.dmg
linux
tar.gz: https://github.com/hydrusnetwork/hydrus/releases/download/v358/Hydrus.Network.358.-.Linux.-.Executable.tar.gz
source
tar.gz: https://github.com/hydrusnetwork/hydrus/archive/v358.tar.gz
I had a great week doing duplicates work and fixing bugs.
duplicates
I split this big duplicates storage overhaul into three jobs, and this week marks the third and final job done. Like alternates and duplicates information, potential pairs are now stored in a unified and more efficient way.
On the front end, you may notice your potential pairs queue shorten again on week. It will also shrink faster as you process in the duplicate filter, which will present more 'useful' duplicate pairs first and apply your decisions more intelligently at the db level.
All the code is simpler except for one key area. If you notice certain the 'show some random potentials', duplicate filter or potential counts load time take way too long, please let me know about your situation. It is possible I will have to revisit this complicated 'join', although in my tests it is performing well.
Also, I have written the new record to stop alternate pairs coming up in the duplicate filter repeatedly, as some users have experienced. These relationships are now more concrete, and this concreteness is plugged into duplicate merge operations and so on. You may see one more round of alternates appearing, and then they will be saved properly.
Now everything is stored on the new system, there are two main jobs remaining: re-adding various administrative commands like remove/dissolve to properly undo relationships, and adding some options and improvements to the duplicate filter workflow.
the rest
Pixiv changed their format recently, so hydrus's default parser broke. This should be automatically fixed this week. Thanks to a user who sent in this fix.
The issue where mouse scroll events were not being caught when a media viewer did not have focus is also fixed.
The 'watcher' page now reports file and check status in the 'status' column! I missed this somehow when I added it for the gallery downloader. This makes it just a little easier to see what a list of threads is currently doing.
I may have fixed the problem where exiting manage tags from a media viewer sometimes falls focus back to the main gui. Please let me know if you still get this (and if so, if you know a way you can reliably repeat this behaviour).
I improved some of the network engine's 'this connection ended early' checks. This may have fixed some issues users had downloading images and page data from some unreliable servers, but if it does not, please send me any incomplete jpegs and the URLs they came from so I can check further on my end. Also, the whole system is more strict about response lengths now, so if you discover false-positive network failures here, please report them.
Also some server issues related to last week's client api authentication improvements (such as file repository file upload sometimes breaking) should be fixed.
new client api library
If you would like to work on the Client API using Node.js, check out the new module a user wrote here:
https://github.com/cravxx/hydrus.js
This is now in the help along with the rest of the API here:
https://hydrusnetwork.github.io/hydrus/help/client_api.html
full list
- duplicates:
- the final large data storage overhaul work of the duplicates work big job is done–potential duplicate information is now stored more sensibly and efficiently. potential pair information is now stored between duplicate file groups, rather than files themselves. when duplicate file groups are merged, or alternate or false positive relationships set, potentials are merged and culled appropriately
- your existing potential data will be updated. the current potential pairs queue size will shrink as duplicate potential relationships are merged
- the duplicate filter now presents file kings as comparison files when possible, increasing pair difference and decision value
- potential pair information is now stored with the 'distance' between the two files as found by the similar-files search system. the duplicate filter will serve files with closer distance first, which increases decision value by front-loading likely duplicates instead of alts. distance values for existing potential pair info is estimated on update, so if you have done search distance 2 or greater and would like to fill in this data accurately to get closer potentials first, you might like to reset your potential duplicates under the cog icon (bear in mind this reset will schedule a decent whack of CPU for your idle maintenance time)
- setting alternate relationship on a pair is now fixed more concretely, ensuring that in various search expansions or resets that the same pair will not come up again. this solves some related problems users have had trying to 'fix' larger alternate groups in place–you may see your alternates compared one last time, but that should be the final go. these fixed relationships are merged as intra-alternate group members merge due to duplicate-setting events
- a variety of potential duplicates code has been streamlined based on the new duplicate group relationship
- improved how a second-best king representative of a group is selected in various file relationship fetching jobs when the true king is not permitted by search domain
- one critical part of the new potential duplicates system is more complicated. if you experience much slower searches or count retrievals IRL, please let me know your details
- expanded duplicates unit tests to test potential counts for all tested situations
- fixed a bug where alternate group merging would not cull now-invalid false-positive potential pairs
- the rest:
- updated the default pixiv parser to work with their new format–thank you to a user for providing this fix
- fixed the issue where mouse scroll events were not being processed by the main viewer canvas when it did not have focus
- file page parsers that produce multiple urls through subsidiary page parsers now correctly pass down associated urls and tags to their child file import items
- updated to wx 4.0.6 on all built platforms–looks like a bunch of bug fixes, so fingers-crossed this improves some stability and jank
- updated the recent server access-key-arg-parsing routine to check access from the header before parsing args, which fixes an issue with testing decompression bomb permission on file POST requests on the file repository. generally improved code here to deal more gracefully with failures
- the repositories now max out at 1000 count when fetching pending petition counts (speeding up access when there are large queues)
- the repositories now fetch petitions much faster when there are large queues
- frames and dialogs will be slightly more aggressive about ensuring their parents now get focus back when they are closed (rather than the top level main gui, which sometimes happens due to window manager weirdness)
- rewrote a bad old legacy method of refocusing the manage tags panel that kicks in when the 'open manage tags' action is processed by the media viewer canvas but the panel is already open
- hitting 'refresh account' on a paused service now gives a better immediate message rather than failing after delay on a confusing 'bad login' error
- improved login errors' text to specify the exact problem raised by the login manager
- fixed a problem in the duplicates page when a status update is called before the initial db status fetch is complete
- the manage tag siblings panel now detects if the pair you wish to add connects to a loop already in the database (which is a rare but possible case). previously it would hang indefinitely! it now cancels the add, communicates the tags in the loop, and recommends you break it manually
- added a link to https://github.com/cravxx/hydrus.js , a node.js module that plugs into the client api, to the help
- a variety of user-started network jobs such as refreshing account and testing a server connection under manage services now only attempt connection once (to fail faster as the user waits)
- the 'test address' job under manage services is now asynchronous and will not hang the ui while it waits for a response
- fixed some unstable thread-to-wx code under the 'test access key' job under manage services
- improved some file handling to ensure open files are closed more promptly in certain circumstances
- fixed some unstable thread-to-wx communication in the ipfs review services panel
- improved the accuracy of the network engine's 'incomplete download' test and bandwidth reporting to work with exact byte counts when available, regardless of content encoding. downloads that provide too few bytes in ways that were previously not caught will be reattempted according to the normal connection reattempt rules. these network fixes may solve some broken jpegs and json some users have seen from unreliable servers
- fixed watcher entries in the watcher page list not reporting their file and check download status as they work (as the gallery downloader does)
- the client api will now deliver cleaner 400 errors when a given url argument is empty or otherwise fails to normalise (previously it was giving 500s)
- misc cleanup
next week
I had hoped to do some IPFS work this week, but I ran out of time to do it properly. This is now the main job for next week. Otherwise, I will do some of this final duplicates work and some misc small jobs.