[ / / / / / / / / / / / / / ] [ dir / random / 93 / biohzrd / hkacade / hkpnd / tct / utd / uy / yebalnia ][Options][ watchlist ]

/prog/ - Programming

Programming
You can now write text to your AI-generated image at https://aiproto.com It is currently free to use for Proto members.

Name
Email
Subject
REC

0:00

Comment *
File
Select/drop/paste files here
Password (Randomized for file and post deletion; you may also set your own.)
Archive
* = required field[▶Show post options & limits]
Confused? See the FAQ.
Expand all images

File (hide): 1453177111172.png (94.86 KB,337x450,337:450,403.png) (h) (u)

[–]

c6be61 (4) No.3852 >>3882 >>3883 [Watch Thread][Show All Posts]

I wrote a python script to download images from tinyboard/vichan imageboards.

It works on every imageboard I try except 8ch, which gives me a 403 forbidden error. I tried changing my user agent within the script (perhaps unsuccessfully), but still 403. What gives?


#!/usr/bin/env python3

import argparse, bs4, os, urllib.request, urllib.parse

parser = argparse.ArgumentParser()
parser.add_argument("url", help="Link to thread")
parser.add_argument("-d", help="Directory to download to")
args = parser.parse_args()

if args.d:
if not os.path.exists(args.d):
os.makedirs(args.d)
os.chdir(args.d)

soup = bs4.BeautifulSoup(urllib.request.urlopen(args.url))

domain = urllib.parse.urlparse(args.url).netloc
http = urllib.parse.urlparse(args.url).scheme + "://"

for link in soup.find_all("p", class_="fileinfo"):
image = http + domain + link.next_sibling.get("href")
filename = image.rsplit("/", 1)[1]
if not os.path.exists(filename):
urllib.request.urlretrieve(image, filename

____________________________
Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

c6be61 (4) No.3853

Note that I accidentally removed a parenthesis to close the very last line.

- urllib.request.urlretrieve(image, filename

+ urllib.request.urlretrieve(image, filename)

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

c6be61 (4) No.3882

>>3852 (OP)

If I had to guess,

>image = http + domain + link.next_sibling.get("href")

href of link is is https://media.8ch.net/prog/src/1453177111172.png for 8chan. Look at in in a page inspector.

I think it's because hotwheels has some weird hacky shit with the servers due to bui and/or site growth.

You might have figured this out by yourself already, since this was a while ago.

I might try writing my own version in javascript or bash for 8chan specifically. I'll prob post it here if I do

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.

c6be61 (4) No.3883

>>3852 (OP)

If I had to guess,

>image = http + domain + link.next_sibling.get("href")

href of link is is https://media.8ch.net/prog/src/1453177111172.png for 8chan. Look at in in a page inspector.

I think it's because hotwheels has some weird hacky shit with the servers due to bui and/or site growth.

You might have figured this out by yourself already, since this was a while ago.

I might try writing my own version in javascript or bash for 8chan specifically. I'll prob post it here if I do

fak you flood detection

Disclaimer: this post and the subject matter and contents thereof - text, media, or otherwise - do not necessarily reflect the views of the 8kun administration.



[Return][Go to top][Catalog][Screencap][Nerve Center][Random][Update] ( Scroll to new posts) ( Auto) 5
3 replies | 0 images | 1 UIDs | Page ?
[Post a Reply]
[ / / / / / / / / / / / / / ] [ dir / random / 93 / biohzrd / hkacade / hkpnd / tct / utd / uy / yebalnia ][ watchlist ]