Backing up YouTube channel

Email
Comment *
Verification *
File	Select/drop/paste files here
Password	(Randomized for file and post deletion; you may also set your own.)
* = required field	[▶ Show post options & limits] Confused? See the FAQ.

Flag
Oekaki	Show oekaki applet (replaces files and can be used instead)
Options	Do not bump (you can also write sage in the email field) Spoiler images (this replaces the thumbnails of your images with question marks)
Allowed file types:jpg, jpeg, gif, png, webm, mp4, pdf Max filesize is 16 MB. Max image dimensions are 15000 x 15000. You may upload 3 per post.

[–]

▶Backing up YouTube channel Anonymous 03/26/19 (Tue) 13:05:38 No.1045204>>1045750 [Watch Thread][Show All Posts]

>tries to download 267 of vids

>uses Youtube-dl

> Has 1TB of storage so he doesn't have to worry

>leaves it running over night

I wake up to find the next morning I could only download a 136 of his videos because it dc :(

▶Anonymous 03/26/19 (Tue) 13:10:20 No.1045207>>1045222 >>1045640

try youtube dl gui

▶Anonymous 03/26/19 (Tue) 13:34:02 No.1045220>>1045750 >>1047079

-i

▶Anonymous 03/26/19 (Tue) 13:40:54 No.1045222

this op >>1045207 last time i scraped defcon vids it's works great

▶Anonymous 03/26/19 (Tue) 13:45:44 No.1045223>>1045257 >>1045259 >>1045262

>tries to do a thing

>not familiar enough with tool and task to expect obvious problem

>doesn't read manual

>very disappointed with result

>educated by /tech/

<creates a PR to make -i the default on downloads of large numbers of videos, with a banner on exit to explain this to the user

▶Anonymous 03/26/19 (Tue) 15:09:53 No.1045257>>1045335

>>1045223

You know what you can insult me for not looking at the manual all you want but I'm not gonna download anymore because I just saw the folder size 72GB yeah I know hes dead and all but im not backing up any more

▶Anonymous 03/26/19 (Tue) 15:14:17 No.1045259

>>1045223

136 videos of playing doom and rainbow 6 is were it stays

▶Anonymous 03/26/19 (Tue) 15:16:48 No.1045262

>>1045223

If i keep down loading shit at that rate i wouldn't have space for all of my neko porn

▶Anonymous 03/26/19 (Tue) 16:32:38 No.1045335>>1045382

>>1045257

it wouldn't've helped you to look at the manual because you still wouldn't've expected this problem.

You will next time.

▶Anonymous 03/26/19 (Tue) 17:46:11 No.1045382

>>1045335

Yeah I probably should have done that and I've also probably should have logged in as well Think there is an option that prevents you from being kicked by a logging in i think

▶Anonymous 03/27/19 (Wed) 01:17:11 No.1045640

>>1045207

gui is bloat

▶Anonymous 03/27/19 (Wed) 08:08:03 No.1045734

Just because you can't use it right u faggot

▶Anonymous 03/27/19 (Wed) 09:23:20 No.1045750>>1045810 >>1047077

>>1045204 (OP)

Reminded me of the hell that was downloading a channel with 500+ videos. But yeah, as >>1045220 said, you just need to pass the -i flag which will ignore the errors and continue to download other videos.

Using it from within Python gives you more control but if you don't want to go that deep, then you could use something like this:

CHANNEL='UCD6VugMZKRhSyzWEWA9W2fg'
youtube-dl \
    --no-progress --no-warnings \
    -i --ignore-config --prefer-ffmpeg \
    -R 'infinite' --fragment-retries 'infinite' \
    --abort-on-unavailable-fragment --geo-bypass --no-check-certificate \
    --match-filter "channel_id = '${CHANNEL}'" --playlist-reverse \
    -w -o '%(upload_date)s.%(id)s/%(id)s.%(ext)s' -f 'bestvideo+bestaudio/best' \
    --download-archive 'ARCHIVE' --merge-output-format 'mkv' \
    --no-continue --write-info-json --write-thumbnail --all-subs \
    -- "https://www.youtube.com/channel/${CHANNEL}" "https://www.youtube.com/channel/${CHANNEL}/playlists" \
    2>&1 | tee "youtube-dl.log"

This will first download all the videos on the channel and then look through its playlists and download all the videos that belong to this channel (useful for cases when the channel has unlisted videos in some playlist) ignoring all errors. Downloaded videos will be written in the ARCHIVE file, so you could skip the already downloaded videos and easily update the archive as the channel uploads new videos or retry the download if some videos weren't downloaded due to errors. All videos will be in best possible quality contained withing Matroska (.mkv). Besides the videos themselves, this will also download the metadata in a form of JSON file (.info.json), the thumbnail in max. resolution and all available closed captions or subtitles.

The result would look something like this:

.
├── 20190127.0FW23bamIZI
│   ├── 0FW23bamIZI.info.json
│   ├── 0FW23bamIZI.jpg
│   └── 0FW23bamIZI.mkv
├── 20190226.wXo24imR_54
│   ├── wXo24imR_54.info.json
│   ├── wXo24imR_54.jpg
│   └── wXo24imR_54.mkv
├── 20190317.URJ_qSXruW0
│   ├── URJ_qSXruW0.info.json
│   ├── URJ_qSXruW0.jpg
│   └── URJ_qSXruW0.mkv
└── ARCHIVE

The only problem this way of doing things has is that when scanning the playlists for videos that belong to this channel, youtube-dl has to first download the video page and check the metadata which is slow and a waste of traffic. On top of that, there's no internal mechanism to prevent youtube-dl from analyzing already rejected videos. What you can do, is get all the rejected videos from youtube-dl.log and keep them in the ARCHIVE file so they'll be ignored entirely when you update the archive.

They all have the same message:

[youtube] 8UD50tPFCYo: Downloading webpage
[youtube] 8UD50tPFCYo: Downloading video info webpage
[download] Mario Tennis does not pass filter channel_id = 'UCPcIwIn5WO6_o_vXF8SXx3w', skipping ..

You can also do the same thing with the videos that weren't downloaded due to copyright errors or some other shit. Just find them in the log, and write in ARCHIVE before reattempting to archive the channel.

Personally, I just kept an EXCLUDE file with all the videos I don't need and each time I needed to update a channel, before actually starting youtube-dl, I was switching to the directory and doing something like this:

jq -r '"youtube \(.id)"' */*.info.json | cat - EXCLUDE > ARCHIVE

This is how I used to do it before switching to Python.

▶Anonymous 03/27/19 (Wed) 11:56:06 No.1045810>>1045847

>>1045750

Forgot to mention, YouTube channels have an RSS feed (https://www.youtube.com/feeds/videos.xml?channel_id=${CHANNEL}) so you can have, for example, a Cron job that checks the feed for new videos and archives them.

▶Anonymous 03/27/19 (Wed) 14:06:38 No.1045847

>>1045810

>YouTube channels have an RSS feed

Not all of them have it activated, but most.

▶Anonymous 03/29/19 (Fri) 05:58:35 No.1047077

>>1045750

I feel like you could get around the duplicate video issue by simply doing a run grabbing video URLs only, feeding them into a file, and then filtering out duplicates. From there, do a second run using the sorted file as input.

If there's no duplicates (ie, you're not searching multiple playlists and sources) it will run a little longer, but it will save you the hassle of re-downloading the webpage multiple times if there is a possibility of duplicates.

▶Anonymous 03/29/19 (Fri) 06:03:06 No.1047079

>>1045220

/thread

▶Anonymous 04/16/19 (Tue) 08:58:13 No.1055111

HAPAS ARE SUPERIOR TO WHITES

▶Anonymous 04/17/19 (Wed) 10:42:56 No.1057519

Yeah, right, and the moon is made of cheese.

/tech/ - Technology★

General

WebM

Theme

User JS

Do not paste code here unless you absolutely trust the source or have read it yourself!

Favorites

Customize Formatting

Filters