windows
zip: https://github.com/hydrusnetwork/hydrus/releases/download/v424/Hydrus.Network.424.-.Windows.-.Extract.only.zip
exe: https://github.com/hydrusnetwork/hydrus/releases/download/v424/Hydrus.Network.424.-.Windows.-.Installer.exe
macOS
app: https://github.com/hydrusnetwork/hydrus/releases/download/v424/Hydrus.Network.424.-.macOS.-.App.dmg
linux
tar.gz: https://github.com/hydrusnetwork/hydrus/releases/download/v424/Hydrus.Network.424.-.Linux.-.Executable.tar.gz
I had a good week. There are some quality of life improvements and faster tag search across the board.
The update will take some time this week to update a cache. If you do not sync to the PTR, it will be just a few seconds. If you sync to the PTR, expect about 5-15 minutes.
faster tag search
In the second half of 2020, I tried several times to tune the database for different sorts of wildcard tag search, which is used in all autocomplete lookups and many file searches. I was sometimes able to get small clients always running well, or complicated large systems running, well, but I failed to get it good for all situations with code alone–the structure of the database tag lookup cache made the tuning difficult.
So, I have updated how that cache works. Rather than always searching one big master table, the client can now 'zoom' in on the appropriate search context based on the type of search page or manage tags dialog or whatever.
Pretty much anything related to autocomplete and tag-based file searches is faster. Most importantly, the worst-case time for these searches is greatly improved. Complicated searches, like a 'namespace:*anything*' file search, should no longer have sudden gigantic lag spikes. These searches may still take ten seconds or more when searching millions of tags and files, but they won't accidentally lag out for two minutes on some tiny 'my tags' search with only 60 results.
The only exception in my testing is 'number of tags' searches still have bad cancelability. It is better, but not great. I'll keep working here.
The cache replaces an existing one. It will take some time to build it on update. If you do not sync with the PTR, it should just be a few seconds. If you sync with the PTR on an SSD, it should be 5-15 minutes (on my heavy client with a nice SSD, it was 7 minutes). If you sync with the PTR on an HDD, it will take significantly longer, so please plan for it. If you sync with the PTR, you will see some numbers count up as it builds the different parts of the cache. There will be some deletion work to start, then counting up to perhaps a million, and then up to 16 million or so, at about 30,000 a second.
I have more plans here, and more work to do to optimise the tag display system, but I will let this new cache breathe for a bit before going back in here with a machete.
full list
- new tag caches:
- as 2020 ended, I attempted but failed to tune fast search for all kinds of clients, big and small and simple and complex. unable to guarantee decent speeds with just code, I have redesigned the tag text search cache. rather than checking the gigantic master table for all namespace and subtag lookups, the client can now zoom in on a small fast cache limited to the current search context, so doing a clever lookup on 'my tags' will no longer be hampered by having PTR beside it, and doing a solid lookup on the PTR or 'all known tags' will no longer be accidentally hampered by an optimisation for another situation
- the 424 update will take some time to generate the new caches for your existing data. if you don't sync with the PTR, it should be a few seconds. if you do sync, it will be about ten minutes on an SSD (seems about 30,000 definitions a second), and somewhat longer on an HDD. it will count up the tags as it goes, and on the PTR there will be a bit of deletion work, then one or two counts up to perhaps a million, and then one big count up to about 16 million.
- in my initial tests, this cache adds about 1-2% additional processing time to mass tag changes, but a wide variety of tag lookups and file searches are now significantly faster, have much nicer worst-case lag spikes, and should cancel quicker. these are best in any specific tag domain, although 'all known tags' should still be much better. a future expansion of the tag cache is planned to finally address clean and accurate 'all known tags' searches
- summary; all these should be faster and cancel faster:
- autocomplete searches for 'subtag*' (most normal searches) are optimised
- autocomplete searches for 'namespace:*' are optimised, including when the namespace itself is a wildcard
- autocomplete searches for wildcards with an asterisk in the middle of the subtag are optimised
- autocomplete searches for wildcards with an asterisk at the beginning of the subtag are optimised (but this is still generally the slowest query)
- autocomplete searches for namespace and subtag wildcard combinations are optimised, with either or both as a wildcard of any type
- autocomplete searches for '*' are optimised
- tag file searches without a namespace (i.e. in file search, with any namespace) are optimised
- namespace file searches are optimised, including when the namespace is a wildcard
- wildcard file searches are optimised, for all the classes of wildcard above
- 'tag as number' file searches are optimised
- 'has ><= x namespace tags' file searches are optimised for speed, including when the namespace is a wildcard, but still have bad cancelability on large domains. I'll work on this more
- .
- other tag cache info:
- the 'tag text search cache' regeneration routine under the _database->regenerate_ menu is replaced with a service specific routine for the new cache
- on boot, if the client sees any of the new cache tables are missing, it notifies you and regenerates the affected subsection of the cache
- an old method of performing complex wildcard searches was using surplus data and has been eliminated. these searches are now also computationally cheaper beyond the other domain-based optimisations this week
- I have identified the next bottleneck in the tag search pipeline and have a plan to speed all the above up even further, which can all be done in code
- thanks to user feedback, I have also identified other wasteful overhead in tag processing. I'll keep working!
- while the planned 'all known tags' cache will be useful since most file searches are in this domain, it will be a bit of work, so I will first let this new lookup cache breathe for a bit. 'all known tags' will not be nearly as big as the 'all known files/combined file' caches that have hit us with so much CPU recently. I expect it to increase the client.caches.db size by about 5%
- unified all increments or decrements to autocomplete count caches, no matter the service domain, to one location
- unified how autocomplete counts are fetched across different service domains
- optimised specific and combined autocomplete count cache update overhead for new, existing, and deleted tags
- optimised display autocomplete count cache updates for tags with multiple siblings or parents
- optimised the 'local tags cache', which does fast tag text fetching for local files, when new tags or files are added/removed from the 'all local files' domain. this now occurs in the same unified autocomplete count update process. it now also caches pending tags that have no current count
- merged 'exact match' autocomplete tag searching code into generalised wildcard search
- misc autocomplete and other tag code cleanup and harmonisation
- ditched some old mass UNION queries that were not cancelling well
- .
- the rest:
- when you paste queries into a sub, the summary 'these were/were not added' dialog now always appears, and if you paste empty whitespace, it now says so
- the manage siblings/parents dialogs now specify which services apply which siblings, whether they are fully synced, the current display tag sync maintenance settings, and ultimately whether you can expect changes to apply quickly after dialog ok
- when a text entry dialog comes with suggestion buttons, it now focuses the text box by default. sorry for the trouble here! (issue #765)
- updated a couple petition reason suggestions in manage tags and parents
- added a shortcut to 'main window' to refresh _manage tags'_ related tags suggestions with 'thorough' duration. in future, these dialog-specific actions will be moved out of 'main window', these have just been a 'temporary' patch
- updated the 'running from source' and 'install' help with some new numbers and info about mpv, and updated the 'server' help with a document helpfully provided by a user explaining that the server does not do what many new users think
- sped up 'has tags' file searches in certain situations, mostly when there are few if any other search predicates
- the default e621 parser now pulls meta tags, thank you to a user for providing this
- the default nitter timeline url classes are updated, thank you to a user for providing this
- the new little hook that takes 'file:///' off of paths pasted into the filename tagging path text now also normalises the path, so if you are on Windows, the URI's slashes will be Windows-corrected to backlashes. it also now removes wrapping quotes
- the hydrus logger again correctly restores stdout and stderr after it is closed on program exit (this was disabled for some reason, but fingers crossed it seems fine now!)
- an issue where automatically started duplicate potentials file search could not cancel when shutdown 'stop work' button was clicked or where idle maintenance mode turned off should be fixed
- the shutdown maintenance work for the first client shutdown now has a little text saying it is just some quick initialisation work
- for hopefully the last and completely final time, I think I fixed the invalid tag repair function for certain sorts of tags applied to currently local files
- improved the way a job thread was pulling new jobs (issue #750)
next week
The poll is done! Here's the link again: https://www.survey-maker.com/results3310902xA574481e-102#tab-2
Multiple local file services has won. It looks like better URL sharing and file alternates will be soon after, as well. Thank you for voting–seeing what isn't popular is as useful as seeing what is.
Unfortunately, I cannot start that immediately. I have a fire to put out next week related to the network objects lagging too much when saving their updates. I will spend the rest of Q1 doing the delayed network improvements. So, with luck, I will get going on local file services in Q2.
I also have a ton of messages to catch up on!