1e8781 No.9068 [View All]
ITT: create proposals for making Hydrus more optimized.
Proposal: Why can't Hydrus switch to MariaDB?
If it is faster, then it should be better. The only trouble is having the need to rewrite the queries, which from an SQL standpoint should be a non-issue, right?
List of Databases with Open Source License and Open Source APIs:
SQLite - Currently used in Hydrus, has minimal features
MySQL - A more well-rounded SQL Database with user management
PostgreSQL - An SQL with complex features with less performance
MariaDB - SQL/NoSQL database with heavy optimizations
ElasticSearch - A literal search engine instead of a normal Database
Teradata - IDK
https://www.digitalocean.com/community/tutorials/sqlite-vs-mysql-vs-postgresql-a-comparison-of-relational-database-management-systems
https://www.infoworld.com/article/2611812/mysql/mysql-face-off--mysql-or-mariadb-.html
4 posts and 1 image reply omitted. Click [Open thread] to view. ____________________________
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
21844b No.9094
Honestly, there are lower fruit to pick in order to optimize Hydrus before even touching its database. After the initial processing of mappings, the bulk of I/O access is spent on the files themselves which AFAIK is single-threaded.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
89d121 No.9096
>>9094
Multi-threaded Python won't end too well… Some say Go or Rust, but I know it is a meme to rewrite everything.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
5105b1 No.9112
>>9077
kde/plasma also starts up a mysql/mariadb instance for everything pim related and users hate it because they never managed to write their software in a way that wouldn't crash the database. All in all, i think the startup time required for a mysqlish database is negligible on a modern system but the amount of code required to make it act like a embedded database is astronomical and the exact opposite of what this project needs.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
9b4af4 No.9117
What about using FreeNAS in conjunction with Hydrus for ZFS-like performance?
Or is there a distro that is best suited for image and file hoarding with RAID-like redundency?
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
9b4af4 No.9118
>>9112
Well can we layout a pros vs cons of Embedded Database vs Optimized database like MariaDB?
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
833c67 No.9120
Would it be possible to use some ORM library for SQL and let user choose SQL backend?
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
49430e No.9121
I would not mind runing mariadb daemon for hydrus.
In fact, i am running one right now, and it would be great if i could set hydrus up to just connect to an existing database.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
9b4af4 No.9123
>>9068
What about file system parities? Would installing Hydrus on FreeNAS with ZFS be a good idea? What about Linux with BTRFS?
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
b241f3 No.9182
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
b241f3 No.9183
http://www.freenas.org/blog/a-complete-guide-to-freenas-hardware-design-part-i-purpose-and-best-practices/
http://www.freenas.org/blog/a-complete-guide-to-freenas-hardware-design-part-ii-hardware-specifics/
http://www.freenas.org/blog/a-complete-guide-to-freenas-hardware-design-part-iii-pools-performance-and-cache/
http://www.freenas.org/blog/a-complete-guide-to-freenas-hardware-design-part-iv-network-notes-conclusion/
http://www.freenas.org/blog/freenas-worst-practices/
Some of the points:
1. 8GB of RAM minimum, 12GB minimum if using plugins or jails, 1GB RAM per 1TB (conservative) or 3TB (liberal)
2. Don't use RAID controllers, just use Hot Bus Adapters to connect the drives to the motherboard (software "RAID")
3. FreeNAS needs bare metal, NOT VMs (but putting plugins or jails into FreeNAS is a good idea)
4. Intel CPU has more support than AMD, and LSI has the best Hot Bus Adapters (Marvell and J-Micron is okay)
5. 7200 RPM SAS or Enterprise SATA will work as HDD, do not use desktop drives for this to prevent IO errors
6. RAIDZ1 is like RAID 5, RAIDZ2 is like Z6, RAIDZ3 has triple parity, each vdev/group only has one-drive speeds
7. "ZFS intent log" should be on RAM (and on power-protected SSD if you wish), without it the whole vdev would fail
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
1e8781 No.9217
https://ponyorm.com/ can actually simplify SQL queries into something more python-friendly.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
0869af No.9281
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
fb9533 No.9323
>>9068
>PostgreSQL - An SQL with complex features with less performance
t. Uber
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
c4095c No.9348
>>9068
How about rewriting this in q++/qt? If you limit yourself to qt syntax then it is surprisingly similar to python with some c++ quirks. It's easier doing multi threading and starting separate processes in this than in python.
Something like this could be easiest done this way:
>Make code modular and switch the GUI to PyQt/pyside while still using python.
>Experiment with the GUI code, perhaps try using QML to facilitate the GUI proposals from that one anon that made all the cool mockups. See >>8185
>Debate if it is even required to switch to c++ anymore since many qt goodies can be used via above mentioned libraries(threading/process starting/native notifications, etc).
I haven't taken a look at the code but if it is already written modular then this shuldn't be too hard if the dev can stay motivated and people can life with a few months of only critical bug fixing.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
02d3aa No.9391
>>9348
That is the issue, the dev is trying to migrate from wxPython to PyQt after the downloader overhaul, along with other key functions like parallel downloads, workflow management and mobile integration.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
be1efa No.9464
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
d8220f No.9530
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
cfc291 No.9658
As mentioned by >>9094 the bottleneck is mostly how the I/O and CPU is handled by hydrus. Imports are done sequentially when they can be sped up a lot by using multiprocessing. I'm sure other actions are still done sequentially too. A transition to a graph database like ArangoDB could be better in the long run, but that's never going to happen.
Looking at the client.master.db database, I'm not sure why he added an index to the md5, sha1 and sha512 columns but not to the subtag or namespace columns. Doesn't make sense to me (and is the sha512 index really necessary?). Also it boggles my mind that foreign keys aren't being used at all.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
bd599a No.9659
>>9658
I am also expecting multi-threading could be a place where we can optimise the code (since most computers now run on 4/8 cores).
Perhaps SQLite, MD5/SHA hashing and de-duplication are not made for multi-core and/or GPU computers.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
cfc291 No.9660
>>9659
>multi-threading
Python threads are all executed on the same core. That's why I said multiprocessing. It spreads out each subprocess across each core. Based on your post you don't know much about software, so think of a subprocess in python like a normal thread.
>are not made for multi-core and/or GPU computers
Everything you've mentioned can be easily sped up with multiple cores. Using a GPU would be even faster but there's no point in using that here. I'm actually pretty surprised he hasn't implemented multiprocessing functions in bottleneck situations like importing. It's very easy to split up the work once you've scanned all the files. You just divide them up by the number of cores and have each subprocess do that portion of the work. If you have 4 cores you have each core do 1/4 of the files you want to import.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
bd599a No.9661
>>9660
>Python threads are all executed on the same core. That's why I said multiprocessing
Well due to people call 4 core Intel CPUs having "hyperthreads" making it 8 virtual cores, I would say that is easy to have those things mixed up. If I have to use a proper term Parallel Programming (as in Concurrency) would be more fitting.
>Everything you've mentioned can be easily sped up with multiple cores
I meant that it has not been implemented yet by the dev since (s/are not/has not been/)
>I'm actually pretty surprised he hasn't implemented multiprocessing functions
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
813085 No.9670
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
6bf834 No.9881
Considering the recent happenings of Tumblr and booru.org purges, it is important to put focus on alternative decentralization libraries.
1. free P2P software
a. BitTorrent - Most commonly used, but can't handle individual files
b. WebTorrent - WebRTC version of BitTorrent, but still have the same issue
c. eDonkey and GNUtella - both very obscure, not really useful or adaptive
d. IPFS - currently used in Hydrus, can handle singular files in a folder structure
2. Proxies and psuedo-VPNs
a. TOR - very common, maybe pozzed by CIA, has BitTorrent and IPFS compatibility (OpenBazaar)
b. I2P - less common, not pozzeed, has BitTorrent compatibility, IPFS is in the works (go-i2p)
c. Freenet and Retroshare - both very uncommon, has file transferring and chats as a primitive
d. Zeronet - pretty dead, works with Javascript, too many unknowns
3. Blockchain data solutions (https://en.wikipedia.org/wiki/Cooperative_storage_cloud)
a. Filecoin - based in IPFS, slowly developing, could be used in conjunction with Hydrus
b. Sia - top data blockchain contender, has smart contracts with regular renewal for storage (https://sia.tech/)
c. MaidSafe - possible competition, includes secure communication and storage (https://maidsafe.net/)
d. Storj - noted, already have average pricing, made to be used along side self-host cloud (https://storj.io/)
e. Ethereum Swarm - note really a good idea as the blockchain is congested by CryptoCats
f. Others include https://decent.ch/ https://www.creativechain.org/ https://contentbox.one/ https://noia.network/
Others: https://cryptoslate.com/category/cryptos/storage/
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
6bf834 No.9882
4. Social media blockchain
a. Steem - used in alt-media like bitchute, dtube and steemit (https://steem.io/)
b. Rocketchat - used by the furrires to commuitcate (https://rocket.chat/)
c. SocialX - at a whitepaper stage, to replace facebook and twitter (https://socialx.network/)
d. Akasha - based in IPFS, meant to replace Tumblr (https://akasha.world/)
e. BAT Token - used by Brave Browser (https://basicattentiontoken.org/)
Others https://foresting.io/ and https://sola.foundation/ and https://www.synereo.com/
https://www.stateofthedapps.com/dapps/tagged/social/tab/most-relevant
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
9f26dd No.9884
>>9881
>booru.org purges
What do you mean?
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
6bf834 No.9886
>>9884
Gelbooru and *.booru.org are hosted in the Netherlands, and they are using "anti-loli laws as an excuse" to force a purge on the admins.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
833c67 No.10077
Do you know how can I convert hydrus db to postgresql? Hydrus db consists of multiple sqlite files, how can I connect all of them?
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
7d7b19 No.10232
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
9c2ceb No.10247
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
7d7b19 No.10272
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
6f19b2 No.10290
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
ec1fb1 No.10361
>>9281
https://vision.fe.uni-lj.si/cvww2016/proceedings/papers/04.pdf (Quantitative Comparison of Feature Matchers Implemented in OpenCV3)
https://sci-hub.tw/10.1109/m2vip.2016.7827292 (Comparison of OpenCV’s Feature Detectors and Feature Matchers)
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
ec1fb1 No.10362
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
978c9b No.10599
>>10361
Got some more comparative papers 4U
https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8346440 (A Comparative Analysis of SIFT, SURF, KAZE, AKAZE, ORB, and BRISK)
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
1e8781 No.10742
https://en.wikipedia.org/wiki/Pointwise_mutual_information
Pointwise mutual information between tag X and tag Y is the logarithm of (num. of images with both tags) * (total image count) / ((num of images with tag X) * (num of images with Tag Y))
PMI can be used to find possible tag siblings
https://en.wikipedia.org/wiki/Conditional_entropy
Conditional entropy of X given Y is ( (num. of images with both tags) / (total image count) ) * logarithm of ( (num of images with tag X) / (num. of images with both tags) )
CE can be used to find possible tag parents and children
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
b990dc No.10805
Nim is low-level Python, Crystal is low-level Ruby, both would be easy for the rest of us (and hopefully the dev) to pick up.
Doing so would mean that Hydrus would be at least twice as fast in certain departments when compared to non-NumPy Python.
(Also D is a C replacement, Go and Kotlin are Java replacements, but those are very different from the syntax of Python)
Are there applications where low-level languages DON'T apply? Math calculations, in that case use SciPy/NumPy for less work.
Some benchmarks:
https://github.com/kostya/benchmarks
https://github.com/drujensen/fib
https://github.com/frol/completely-unscientific-benchmarks
https://github.com/logicchains/LPATHBench
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
aa7425 No.10819
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
b990dc No.11022
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
b990dc No.11023
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
a72330 No.11053
>>10290
>https://github.com/acoustid/acoustid-index (C++)
You're looking for https://github.com/acoustid/chromaprint (C++)
To be honest though when Hydrus starts doing audio fingerprinting it should probably just use acoustid so it can grab tags from MusicBrainz ( https://musicbrainz.org/ )
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
2f2eb0 No.11058
>>11053
Or maybe others as well? What if we are getting music from torrents instead and don't want MusicBrainz to know that I got them?
Bumping to spark conversation
>>10232
http://www.scitepress.org/Papers/2016/59263/59263.pdf (Performance Evaluation of Phonetic Matching Algorithms on English Words and Street Names)
More benchmarks for major phonetic algorithms
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
5cfb09 No.11133
>>9068
>PostgreSQL - An SQL with complex features with less performance
1998 wants it retard memes back.
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
e73dfb No.11204
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
1e8781 No.11206
>>11204
How so? Too many onyomi and kunyomi? Even then if we are not using phonetic fuzzy search, string fuzzy search can still be used (see https://en.wikipedia.org/wiki/String_metric)
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
ac7c72 No.11380
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
d46cda No.11586
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
f06e36 No.11927
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
0a99e5 No.12295
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
0b5902 No.12302
>>12295
Why don't you actually develop something on your own instead of endlessly shitting out github links
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.
0a99e5 No.12307
>>12302
Nah that is for >>12277
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.