[ / / / / / / / / / / / / / ] [ dir / random / 93 / biohzrd / hkacade / hkpnd / tct / utd / uy / yebalnia ]

/comms/ - COMMS

anons talking with anons to anons about anons anonymously
Name
Email
Subject
REC
STOP
Comment *
File
Password (Randomized for file and post deletion; you may also set your own.)
Archive
* = required field[▶Show post options & limits]
Confused? See the FAQ.
Embed
(replaces files and can be used instead)
Options

Allowed file types:jpg, jpeg, gif, png, webp,webm, mp4, mov, pdf
Max filesize is16 MB.
Max image dimensions are15000 x15000.
You may upload5 per post.


Index | Catalog | /qresearch/ | /projectdcomms/
We Are The News | QResearch Board Search | QMAP Qanon Drops

File: c03c70535ad394d⋯.jpg (8.36 KB,261x193,261:193,codin_hatin_2.jpg)

af3481 No.20526

Basic writeup on how to archive 8kun boards to be augmented/discussed further in this thread:

Manual archiving (for small boards)

1. go to the catalog

2. open each thread in the catalog in a new tab

3. for each thread, click to expand all images (this part can be done automatically using a script in the browser console)

4. use the Save Page WE browser extension to grab a complete archive of the thread with all full-size images included in a .html file. I haven't tried to see if this works with video attachments, and I know it won't work with pdfs.

Automated archiving

1. make requests to the catalog

2. make requests to each individual thread in the catalog, optionally based on the time each thread was last updated

2a. save the information for each post into a database and mark deleted posts as deleted

2b. look through all the media files in each new post and download the full version of them (if not already done)

3. repeat

4. periodically review media files that are present in deleted posts and decide if they should be kept or not. it is hard for a program to tell the difference between content that was deleted by mods and things that disappeared due to site errors.

____________________________
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

219626 No.20569

Keeping the points of the maunal archiving method in mind:

Tried using fireshot plugin in firefox. it worked as desired for a single complete thread cap. used png format.

tried using MozArchive plugin for storing a single thread. works as desired for both MAFF and MHT formats.

saving as pdf in the fireshot plugin did not work correctly; it produced a blank pdf..

As long as the idea of "expanding all the images" is done prior to the save, these methods are sufficient for small boards such as mine. Quick and dirty, and it works.

For something like /comms/ or the other community boards, the automated archiving avenue would be super beneficial, imo. I'd use it just because Cadillacs are cool, too.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

219626 No.20575

>>20569

looks like an update to fireshot fixed the save as pdf problem.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

af3481 No.20576

I use an extension called "Save Page WE":

https://addons.mozilla.org/en-US/firefox/addon/save-page-we/

Works well, and much better than a pdf or png because there is no data loss this way.

MozArchive with MAFF or MHT should be fine too but I like Save Page WE because it creates a plain .html file with all images inlined into it.

I've done the automated archive thing too, what I described works but my version is not currently in a state where it would be of any use to others to release it publicly.

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

219626 No.21054

>>20576

maff files can be opened in winzip, kek

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

24d34a No.21652

>>21608 (off-bread)

>using wget for archiving

this works. the resulting html page isn't immediately browseable (no styles, links to media are broken) but it does grab all posts and media files:


wget https://8kun.top/comms/res/21322.html --no-clobber --recursive -
-level=1 --span-hosts --domains=media.8kun.top --wait=0.3 --random-wait

this will not re-download files that already exist, which is what you want for media but not for the thread HTML file. so if you want to re-archive a thread (like when it gets new posts) find the old .html file and delete it or rename it before running that command.

other stuff that could be improved:

- this will download both thumbnails and full versions of media files. I think the –accept-regex option is the way to fix this.

- this might not be able to catch some errors like corrupted media files by itself (sometimes downloads fail partway through and they probably won't be re-downloaded).

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

24d34a No.21655

>>21652

that line break in the middle of "–level=1" is not supposed to be there.

fixed command here:

https://www2.qanonbin.com/paste/1H1Wur2Fv

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.



[Return][Go to top][Catalog][Nerve Center][Random][Post a Reply]
[]
[ / / / / / / / / / / / / / ] [ dir / random / 93 / biohzrd / hkacade / hkpnd / tct / utd / uy / yebalnia ]