windows
zip: https://github.com/hydrusnetwork/hydrus/releases/download/v304/Hydrus.Network.304.-.Windows.-.Extract.only.zip
exe: https://github.com/hydrusnetwork/hydrus/releases/download/v304/Hydrus.Network.304.-.Windows.-.Installer.exe
os x
app: https://github.com/hydrusnetwork/hydrus/releases/download/v304/Hydrus.Network.304.-.OS.X.-.App.dmg
tar.gz: https://github.com/hydrusnetwork/hydrus/releases/download/v304/Hydrus.Network.304.-.OS.X.-.Extract.only.tar.gz
linux
tar.gz: https://github.com/hydrusnetwork/hydrus/releases/download/v304/Hydrus.Network.304.-.Linux.-.Executable.tar.gz
source
tar.gz: https://github.com/hydrusnetwork/hydrus/archive/v304.tar.gz
I had a great week. There is a bunch more downloader work and some new shortcuts.
tag blacklists
After a long time, the client now supports tag blacklists! I've put them under tag import options.
They use the newer 'tag filter' object, which scans a file's tags as they come in, and if it sees any it would exclude (like 'scat' or whatever else you might not want), it stops the file from importing automatically.
On the newer download systems, this vetoes the file before it is downloaded (saving you some time and bandwidth), but the legacy downloaders still download the file and tags together, so they have to stop after the download is done. There are also currently and temporarily two locations where default tag import options can be set. The ambiguity here should all be cleaned up in the coming weeks as I move everything over to the new systems.
If you have been waiting for this, please give it a go and let me know how it works for you. It seems ok in my testing, but I may have let some unusual situations fall through the cracks.
url normalisation and other downloader work
An important objective of this downloader overhaul has been to 'normalise' URLs–to collapse the different ways you can write a single URL into a single comparable format that is not only clean and pretty but also makes it easy to determine if the client has seen and downloaded it before. The new 'url classes' system does this, and in 304, the client will apply URL normalisation to all incoming import URLs. This mostly matters for boorus like e621 that append some random tags at the end of the URL as a description, but it will also convert any matched legacy http definitions or drag-and-drops to https automatically.
The way the client determines if it has seen a URL before–particularly when it has not been matched by the new system (an unknown url class)–is also much improved. The client can now better deal with conflicting data like multiple files claiming to have the same URL without either redownloading every time nor abandoning the attempt entirely.
The main import object will now also handle certain import errors like 404 and the new tag blacklist event in a more graceful way (now called 'veto' status) and also present more information in its '23 successful, 5 skipped' status text.
I've added several basic parsers to the simple downloader for yiff.party as well–thanks to @cuddlebear on the discord for the submission. I expect to do more here in future as well (likely making a watchable url class and full-blooded PageParser so you can 'watch' a yiff.party stream like a thread and even a subscription).
Due to a mistake in the update code, I accidentally restore the simple downloader parsers to default (which includes the new yiff.party parsers)–if you have a bunch of custom simple downloader parsers set up, please export them before you update, or wait a week for the fixed update code (which will just add the new parsers to whatever already exists) to roll out.
new shortcuts
You can now set shortcuts for opening the new downloader pages (urls, simple, and thread watcher) and the duplicate filter and page of pages, all under the 'main_gui' shortcut set. Support for individual file pages should come in the near future.
And you can now shortcut all the duplicate-setting actions (like 'set these all as alternates' or even 'the focused one is better than all the others selected') under the 'media' shortcut set. These are advanced commands, so if you don't get the duplicate filter yet, stay away!
full list
- renamed the new 'tagcensor' object to 'tagfilter' (since it will end up doing a bunch of non-censoring jobs) and refactored it into clienttags
- attached a tag filter object to all tag import options to act as a tag blacklist. all tags that go through the import pipeline (except for a couple of old legacy instances) are now checked against the blacklist, and if a bad tag is found, the file vetoes! tag import options has some new ui to handle this and background code to deal with inheritance from defaults and so on
- new file import urls that have url classes, no matter their source, are now normalised!
- all new file import urls are now tested against both the original and normalised version of the url, so even though previously parsed urls remain un-normalised, new urls that are pre-normalised the same will not count as new! -fingers crossed-
- on update, the db will get normalised copies of all existing urls. this means many files will now have two versions of its urls–some ui to collapse everything down to only the normalised version (after some human eyes have passed in front of this big change) will come in the coming weeks
- some sites where normalisation is a consistent problem for later redownloads (like e621, which appends 'preview' tags to the post url) _should_ now be caught reliably!
- the 'allow subdomains' on edit url class panel is now named 'match subdomains' and has a tooltip to better explain how it works
- 'keep subdomains' is now 'keep matched subdomains' and has a tooltip as well
- the 'keep matched subdomains' enabled behaviour (and some normalisation calculation) is now additionally governed by the 'associate url with files' value and api url conversion info rather than just 'match subdomains' and raw url type
- fixed an issue that was stopping the 'associate url with files' option sticking in edit url class panel
- edit url matches now resorts after an add or edit action
- all listctrls with a wrapper panel now resort after an import from clipboard, png, or defaults call
- url matches now match against www*. versions of their domain regardless of 'match subdomains' settings
- updated xbooru url classes to prefer https
- the manage url class links panel now has a 'clear' button to clear a url_class->parser link
- introduced three new simple downloader parsers for yiff.party, thanks to @cuddlebear on discord for the submission
- the old 'uninteresting mime' status has been expanded to a wider 'vetoed' status to represent all file imports that are abandoned without a particular error (e.g. tag blacklist, wrong filesize or resolution)
- the import system now reports the total of 'num vetoed' as 'num ignored' in its summary statements
- it now also reports 'num skipped'
- the 'num successful' and 'num already in db' are now folded more neatly together in import cache summary statements
- file downloads that are cancelled will now set a 'veto' state rather than a 'skip' state
- improved file import exception handling across the board
- improved how single-file-result parsing vetoes propagate up to the file import status cache
- 404 network errors will now provide a 'veto' status rather than an 'error'
- vetoes will not count as errors when deciding whether a subscription should be abandoned early (so a bunch of decomp bombs or 404s will no longer stutter a subscription!)
- misc fixes and improvements to the new download stuff
- wrote a new parsing cache that saves a lot of work in the new parsing system
- improved the 'is this url known?' test to better deal with situations where all the given urls are galleries or unrecognised–a better aggregate of file status is formed, and 'already in db'/'deleted' statuses will apply if there is no evidence otherwise (the dev got the new logic for this from a legit nightmare about urls downloading over and over, so let's hope it works out)
- the 'is this url known?' logic also recovers from 1->n url->hash relationships where it does not expect them, trying to find 'already in db' hashes over 'deleted' ones
- to clear up some ambiguity, galleries or subscriptions now give a different 'checking in x seconds' status when waiting on the first page of a query
- the 'noneablebytescontrol', as seen in edit file import options, will now correctly disable/enable its bytes sub-control when it is none'ed
- a persistent issue with the new network engine sometimes failing to correctly error after certain broken connections (the computer going to sleep mid-download was a common cause here) should now be recovered from and the connection naturally reattempted
- added three new shortcuts to the 'main_gui' shortcut set that allow for opening a new 'urls', 'simple', or 'thread watcher' downloader page
- added two more shortcuts to 'main_gui' for new 'page of pages' and 'duplicate filter page'
- moved some old 'new page' menu code to the new application command system
- added numerous 'duplicates' shortcuts to the 'media' shortcut set that will work on selections of thumbnails
- the thumbnail duplicates menu actions now go through the new application command system
- fixed an issue where the current tag parents caches was not refreshing when notified
- inputting a short invalid syntactic input on a 'read' tag autocomplete such as '-' will now clear the system predicates list–system preds should now only show on a completely empty input
- fixed an issue where certain combinations of 'remove a tag, then re-add it' nullipotent actions in a single manage tags dialog transaction were not applying reliably (sometimes, the subsequent mirror action was not occuring due to a processing re-order optimisation at the db level)
- made some animation code a little safer and quieter as a test for some users who were getting blitzed with some deadwindow error spam in certain situations–let's see if this changes anything
- replaced all the em dashes in the help with double hyphens as github pages was rendering them wrong
- added CrystalDiskInfo recommendation to 'help my db is broke.txt'
- misc cleanup
next week
Now all the new urls going into the system are normalised, I would like to get the gallery and subscription downloaders to start using the new system where it can find a parsing solution. I and other users can then start adding parsers and it should all naturally migrate over the coming weeks.
I've also still got plenty of small stuff to work on.