windows
zip: https://github.com/hydrusnetwork/hydrus/releases/download/v409/Hydrus.Network.409.-.Windows.-.Extract.only.zip
exe: https://github.com/hydrusnetwork/hydrus/releases/download/v409/Hydrus.Network.409.-.Windows.-.Installer.exe
macOS
app: https://github.com/hydrusnetwork/hydrus/releases/download/v409/Hydrus.Network.409.-.macOS.-.App.dmg
linux
tar.gz: https://github.com/hydrusnetwork/hydrus/releases/download/v409/Hydrus.Network.409.-.Linux.-.Executable.tar.gz
source
tar.gz: https://github.com/hydrusnetwork/hydrus/archive/v409.tar.gz
I had a great week fixing some bugs and optimising the new tag siblings cache. The new code works much faster now.
siblings
I am very happy that there do not seem to have been any obvious errors with the new sibling database cache. Unfortunately, a couple of areas were working inefficiently, which IRL testing helped to diagnose. I put a lot of time into this this week and was very successful - some sections take 10% less time, some 90%, and one critical query now takes 99% less time. It depends on many factors, but many things are faster overall. In particular, tag processing speed, which took a real hit, is back up to good speed, and setting new tag sibling application rules now only needs to regenerate for changed siblings, so if you add (or remove) your own five 'my tags' siblings onto the PTR, the client now only has to do two seconds of work, not ten minutes.
I made some progress on the final awkward things to migrate. Most autocomplete results you see are now able to give themselves the 'will display as xxx' label when needed and match against sibling input (e.g. having an input of 'lotr' match 'series:lord of the rings' due to siblings) on their own, which should save some CPU time when typing. There is still more to do, so I'll keep hammering at it for the next two weeks and see if I can get rid of 'loading tag siblings' on boot before I start on parents db cache.
If you have been waiting for faster code before you update, you might want to wait another week. I just did a test re-do of the 407->408 update step in IRL conditions, and it was not as fast as I wanted it. I'll keep pushing at this.
I am increasingly looking forward to doing that parents db cache, which will extend this work in a new dimension. That will be v412, which I am very confident will be another two-week release. This is some of the most intricate work I have done.
full list
- siblings:
- the slowest of the new sibling regen & update code has received a full optimisation pass. some sections take 10% less time, some 90%, and one critical query takes 99% less time. overall, several big jobs work much faster, and ptr processing, which slowed significantly for many users, should be back up to a good speed. uploading pending tags (which tend to be for local files) should be much faster in particular. let's do another round of IRL observation and profiling this week, and I'll keep at it
- the various 'display' regeneration routines now provide more progress status text, drilling down to the x/y siblings being collapse-counted, or number of files added to a cache, and generally all tag sibling regen got a status update polish pass
- optimised the way tag sibling application is set–now, only the tag siblings that are changed need to have their counts regenerated. hence, if you just apply (or remove) your own five 'my tags' siblings onto the PTR, the client now only has to do two seconds of work, not ten minutes
- .
- the rest:
- fixed the annoying issue with media viewer mouseovers stealing focus/activation from the manage tags dialog. this can now only happen if current focus is on a hover window. sorry for the delay!
- updated manage tag parents dialog to state the pairs being petitioned on the 'petition reason entry' dialog
- updated manage tag parents and siblings dialogs to have appropriate 'reason' suggestions for petitions (previously, they were inheriting the same suggestions as for add)
- ipfs network jobs now have a minimum 'reply' connection timeout of two hours (so giganto directory pushes won't throw an error). connection timeout remains the same, so if the server is hanging on that, it'll still notice
- fixed the 'test address' button on the IPFS manage services panel
- petitioning an IPFS file when there is no IPFS multihash entry in the db no longer causes an error. now, in this case, the file entry is removed with no change made.
- when pending to or petitioning from a file service, a quick filter is now applied to discard invalid files (i.e. (not) already in the service). any weird logical holes where this might occur should now be fixed
- export folders now catch and report missing file errors more nicely
- export folders now remember the last error they encountered and report that in the edit export folders dialog
- .
- boring tag siblings optimisations:
- optimised the tag manager generation routine to use any common file domains for fast cache lookup for any subset of the files available, rather than falling back to 'all known files' domain when there is no single common file domain
- optimised the new 'all known files' display autocomplete cache to use similar faster specific files cache lookups when available
- optimised how the 'all known files' display cache regenerates tag sibling chains. it now takes a shortcut when given non-sibling tags and tags where all but one sibling member have zero count, and it can count current and pending counts separately according to the most efficient counting method (e.g. most pre-display pending counts are 0 across the board, so even if current count is a million, the pending count can often be assumed without lookup overhead). furthermore, the 'clever' count has better query planning and less non-sqlite data overhead, and with experimental data is now chosen more carefully. what was previously a 22s job on a test database now takes 5s
- deduplicated how new mappings are filtered to all the specific cache domains, significantly reducing overhead
- massively optimised a critical - and the slowest - part of the new 'combined' cache that handles add/pend mappings pre-insert presence testing, speeding up the core query about 100x!
- reduced some overhead when doing file service_id normalisation in repository processing
- split up specific chain regen into groups to reduce memory usage
- optimised specific display tag cache 'add file' updates, and thereby basic cache regeneration, to be just a little faster for files that have multiple sibling tags
- all predicates made in the database are now populated with ideal and chain sibling information, and this is used for '(will display as xxx)' labels and autocomplete tag search filtering (e.g. you type in 'lotr', it matches an autocomplete result of 'lord of the rings'). there are still some ui-made predicates to figure out, so the old system remains as a fallback
- related tags lookup is a tiny bit faster and now populates its predicates with ideal and chain sibling info at the db level
- cleaned up some 'fetch related tags' code, might make it a bit faster for large tag counts
- cleaned up the way some mapping tables are fetched
- unified table/table_name nomenclature in the db code
- updated an old data->ui status presentation method (it typically does stuff like "regenning some stuff: 500/10,000"), to not hog so much UI time and not yield worker threads so often when new statuses are coming in real fast
- several late optimisations based on IRL testing
next week
Next week would normally be 'cleanup', but all the optimisation I did here kind of counts as that, so I'll make sure to do some small jobs, just so I am not neglecting other things. Github issues and other non-sibling work.