[ / / / / / / / / / / / / / ] [ dir / random / bane / clang / hentai / htg / loomis / mde / pone / rule34 ]

/hydrus/ - Hydrus Network

Bug reports, feature requests, and other discussion for the hydrus network.
Name
Email
Subject
Comment *
File
Password (Randomized for file and post deletion; you may also set your own.)
Archive
* = required field[▶ Show post options & limits]
Confused? See the FAQ.
Embed
(replaces files and can be used instead)
Voice recorder Show voice recorder

(the Stop button will be clickable 5 seconds after you press Record)
Options

Allowed file types:jpg, jpeg, gif, png, webm, mp4, swf, pdf
Max filesize is 16 MB.
Max image dimensions are 15000 x 15000.
You may upload 5 per post.


New user? Start here ---> http://hydrusnetwork.github.io/hydrus/

Experienced user with a bit of cash who wants to help out? ---> Patreon

Current to-do list has: 2,017 items

Current big job: Catching up on Qt, MPV, tag work, and small jobs. New poll once things have calmed down.


96322f  No.8844

Hello folks, hydrus_dev in particular.

tl;dr: I want to figure out what tags occur most frequently, particularly in my inbox. I have around 800,000 files in my inbox and unless I'm an idiot, Hydrus can't handle this easily. I think I need to do some SQL fuckery. I have backups of my DBs so messing with copies isn't a problem.

Full read: I want to see the frequency of certain tags in my Hydrus, both for cleaning it up and also just to see what I have accumulated the most of.

I'm not precisely an idiot at this stuff, I routinely futz around with Access for my job, but I'm not a programmer in any sense of the word. I have SQLite Browser, MySQL Workbench, and DBeaver installed. I have no clue how to write a proper SQL query, especially since Hydrus' DBs are partitioned in such a way that it's difficult for me to wrap my head around it.

I would ideally use Access, plug in the DBs, and do Design View but, well, I don't have Access on my personal computer.

Is there any way to do this, or advice, or really whatever? I don't think I need much handholding, once I've figured it out once I'm good, but at this point I'm overwhelmed and don't know where to start.

Thanks in advance.

____________________________
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

67e898  No.8870

File: 91d56c8bde0c49d⋯.png (445.93 KB, 711x711, 1:1, 91d56c8bde0c49da005d1a4187….png)

File: c09fbebbf426a76⋯.png (32.52 KB, 971x435, 971:435, top ptr tags.png)

File: f2b2f7d12a5c63f⋯.png (27.26 KB, 217x190, 217:190, f2b2f7d12a5c63fb1c416733b8….png)

Hey, this sounds interesting. I've done similar a couple of times just for fun to report to people the top tags–which are usually some combination of 'female', 'breasts', '1girl', and 'touhou'.

I expect to have some gui out at some point that'll present this to you as a tag cloud, or as a new tab of popular tags on the tag autocomplete, but if you want to do it in SQLite, you'll want something like this:

Open the sqlite3 executable in your db folder and type (you should be able to copy/paste this):

.open client.db
ATTACH "client.caches.db" as cach;
ATTACH "client.master.db" as mast;
SELECT service_id, name FROM services;
(make a note of the file service id and tag service id you want to harvest from)

For popular tags on a service:

SELECT current_count, namespace, subtag FROM combined_files_ac_cache_(tag_service_id) NATURAL JOIN tags NATURAL JOIN namespaces NATURAL JOIN subtags ORDER BY current_count DESC LIMIT 25;

For popular tags on a service cross-references with your files (typically, you'll want your 'my files' service):

SELECT current_count, namespace, subtag FROM specific_ac_cache_(file_service_id)_(tag_service_id) NATURAL JOIN tags NATURAL JOIN namespaces NATURAL JOIN subtags ORDER BY current_count DESC LIMIT 25;

Replace the id just as the number, like 'specific_ac_cache_2_4'.

When you are done, go:

.exit

If you want, you can add a 'namespace="series"' just before the ORDER BY to specify just series tags and so on.

>5 million 'female' tags

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

5adc71  No.8872

>>8870

Haha, I am sitting at a hefty 2 million female tags, go me.

How can I change this query to specifically find, say namespace "pixiv_id" in the inbox? I'm not having much luck changing my "specific_ac_cache" whatever that is.

Thanks for the help so far, though!

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

fa5f35  No.8878

>>8872

SELECT current_count, namespace, subtag FROM combined_files_ac_cache_3 NATURAL JOIN tags NATURAL JOIN namespaces NATURAL JOIN subtags WHERE namespace="pixiv_id" ORDER BY current_count DESC LIMIT 25;

This was able to print mostly what I wanted, I still don't know how to narrow it down to inbox only. This is a lot of fun, figuring this out though.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

67e898  No.8880

File: d7bd65e370fe487⋯.jpg (1.37 MB, 1429x1757, 1429:1757, d7bd65e370fe487d07fb5e5d5b….jpg)

>>8878

>>8872

Ah, sorry, I missed the inbox part in my first read.

Inbox is going to be more complicated. The numbers I am pulling here are from my 'autocomplete cache' for difference file/tag service cross-sections. I pre-compute these numbers and increment/decrement them as the underlying tags and files change, which also makes them easy for quick selection and sort. But it also means they are service-wide.

If you want to get a smaller slice, you'll want to count up the tags manually, which could take a few minutes.

It is probably doable to try and count all that shit simultaneously to sorting and displaying the result, but it'll be a headache and take ages to see if the query is wrong, so I would recommend something like this:

WARNING: This now has CREATE and INSERT! It should not write anything to your db, but it could if you mistype something or mess up somehow! Make sure you back up your db files before you start the session!

ATTACH "client.mappings.db" as mapp;

CREATE TEMP TABLE t_count ( tag_id INTEGER PRIMARY KEY, current_count INTEGER );

INSERT INTO t_count SELECT tag_id, COUNT( * ) FROM current_mappings_table_name (magic selective join phrase) GROUP BY tag_id;

SELECT current_count, namespace, subtag FROM t_count NATURAL JOIN tags NATURAL JOIN namespaces NATURAL JOIN subtags ORDER BY current_count DESC LIMIT 25;

Your current_mappings_table_name should be current_mappings_2 or whatever, where 2 is your tag service service_id. These tables are stored in current_mappings.db, so make sure you do add the new attach (along with caches and master as before) to your session.

The (magic selective join phrase) is anything that intersects the large mappings pool with a smaller subset of files or tags. For you, it is probably going to be:

NATURAL JOIN file_inbox

The mappings table is pairs of (hash_id, tag_id), and files_inbox is a just a list of (hash_id), so the join there will only count up tags that apply to a file in the inbox. If you want to get more complicated, check out client.db's files_info and current_files tables in SQLiteStudio or some other ui.

The temp table will be deleted as soon as you .exit.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

fa5f35  No.8882

File: 31cc2734e55987d⋯.png (462.1 KB, 309x509, 309:509, based_dev.png)

>>8880

This works so great, thank you. Fucking around with databases is oddly satisfying, can't wait to ruin something. I'm working from backups, though, so I'm not particularly worried.

Here's the complete code that I figured out, 99% of this is your work of course:

.open client.db
ATTACH "client.caches.db" as cach;
ATTACH "client.master.db" as mast;
ATTACH "client.mappings.db" as mapp;
SELECT service_id, name FROM services;
CREATE TEMP TABLE t_count ( tag_id INTEGER PRIMARY KEY, current_count INTEGER );

INSERT INTO t_count SELECT tag_id, COUNT( * ) FROM current_mappings_3

NATURAL JOIN file_inbox GROUP BY tag_id; SELECT current_count, namespace, subtag FROM t_count

NATURAL JOIN tags NATURAL JOIN namespaces NATURAL JOIN subtags ORDER BY current_count DESC LIMIT 25;

.open client.db
ATTACH "client.caches.db" as cach;
ATTACH "client.master.db" as mast;
ATTACH "client.mappings.db" as mapp;
SELECT service_id, name FROM services;
CREATE TEMP TABLE t_count ( tag_id INTEGER PRIMARY KEY, current_count INTEGER );

INSERT INTO t_count SELECT tag_id, COUNT( * ) FROM current_mappings_3

NATURAL JOIN file_inbox GROUP BY tag_id; SELECT current_count, namespace, subtag FROM t_count

NATURAL JOIN tags NATURAL JOIN namespaces NATURAL JOIN subtags WHERE namespace="SUBTAG" ORDER BY current_count DESC LIMIT 25;

This is what I am using to pull more specific data. I really appreciate you taking the time to show me this, I do think I'm going to go a bit ham with it.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.



[Return][Go to top][Catalog][Nerve Center][Random][Post a Reply]
Delete Post [ ]
[]
[ / / / / / / / / / / / / / ] [ dir / random / bane / clang / hentai / htg / loomis / mde / pone / rule34 ]