[ / / / / / / / / / / / / / ] [ dir / baphomet / caco / choroy / christ / dbv / dempart / gfl / leandro ][Options][ watchlist ]

/tech/ - Technology

You can now write text to your AI-generated image at https://aiproto.com It is currently free to use for Proto members.
Email
Comment *
Verification *
File
Select/drop/paste files here
Password (Randomized for file and post deletion; you may also set your own.)
* = required field[▶ Show post options & limits]
Confused? See the FAQ.
Expand all images

[–]

 No.1045204>>1045750 [Watch Thread][Show All Posts]

>tries to download 267 of vids

>uses Youtube-dl

> Has 1TB of storage so he doesn't have to worry

>leaves it running over night

I wake up to find the next morning I could only download a 136 of his videos because it dc :(

 No.1045207>>1045222 >>1045640

try youtube dl gui


 No.1045220>>1045750 >>1047079

-i


 No.1045222

this op >>1045207 last time i scraped defcon vids it's works great


 No.1045223>>1045257 >>1045259 >>1045262

>tries to do a thing

>not familiar enough with tool and task to expect obvious problem

>doesn't read manual

>very disappointed with result

>educated by /tech/

<creates a PR to make -i the default on downloads of large numbers of videos, with a banner on exit to explain this to the user


 No.1045257>>1045335

>>1045223

You know what you can insult me for not looking at the manual all you want but I'm not gonna download anymore because I just saw the folder size 72GB yeah I know hes dead and all but im not backing up any more


 No.1045259

>>1045223

136 videos of playing doom and rainbow 6 is were it stays


 No.1045262

>>1045223

If i keep down loading shit at that rate i wouldn't have space for all of my neko porn


 No.1045335>>1045382

>>1045257

it wouldn't've helped you to look at the manual because you still wouldn't've expected this problem.

You will next time.


 No.1045382

>>1045335

Yeah I probably should have done that and I've also probably should have logged in as well Think there is an option that prevents you from being kicked by a logging in i think


 No.1045640

>>1045207

gui is bloat


 No.1045734

Just because you can't use it right u faggot


 No.1045750>>1045810 >>1047077

>>1045204 (OP)

Reminded me of the hell that was downloading a channel with 500+ videos. But yeah, as >>1045220 said, you just need to pass the -i flag which will ignore the errors and continue to download other videos.

Using it from within Python gives you more control but if you don't want to go that deep, then you could use something like this:

CHANNEL='UCD6VugMZKRhSyzWEWA9W2fg'
youtube-dl \
--no-progress --no-warnings \
-i --ignore-config --prefer-ffmpeg \
-R 'infinite' --fragment-retries 'infinite' \
--abort-on-unavailable-fragment --geo-bypass --no-check-certificate \
--match-filter "channel_id = '${CHANNEL}'" --playlist-reverse \
-w -o '%(upload_date)s.%(id)s/%(id)s.%(ext)s' -f 'bestvideo+bestaudio/best' \
--download-archive 'ARCHIVE' --merge-output-format 'mkv' \
--no-continue --write-info-json --write-thumbnail --all-subs \
-- "https://www.youtube.com/channel/${CHANNEL}" "https://www.youtube.com/channel/${CHANNEL}/playlists" \
2>&1 | tee "youtube-dl.log"
This will first download all the videos on the channel and then look through its playlists and download all the videos that belong to this channel (useful for cases when the channel has unlisted videos in some playlist) ignoring all errors. Downloaded videos will be written in the ARCHIVE file, so you could skip the already downloaded videos and easily update the archive as the channel uploads new videos or retry the download if some videos weren't downloaded due to errors. All videos will be in best possible quality contained withing Matroska (.mkv). Besides the videos themselves, this will also download the metadata in a form of JSON file (.info.json), the thumbnail in max. resolution and all available closed captions or subtitles.

The result would look something like this:

.
├── 20190127.0FW23bamIZI
   ├── 0FW23bamIZI.info.json
   ├── 0FW23bamIZI.jpg
   └── 0FW23bamIZI.mkv
├── 20190226.wXo24imR_54
   ├── wXo24imR_54.info.json
   ├── wXo24imR_54.jpg
   └── wXo24imR_54.mkv
├── 20190317.URJ_qSXruW0
   ├── URJ_qSXruW0.info.json
   ├── URJ_qSXruW0.jpg
   └── URJ_qSXruW0.mkv
└── ARCHIVE
The only problem this way of doing things has is that when scanning the playlists for videos that belong to this channel, youtube-dl has to first download the video page and check the metadata which is slow and a waste of traffic. On top of that, there's no internal mechanism to prevent youtube-dl from analyzing already rejected videos. What you can do, is get all the rejected videos from youtube-dl.log and keep them in the ARCHIVE file so they'll be ignored entirely when you update the archive.

They all have the same message:

[youtube] 8UD50tPFCYo: Downloading webpage
[youtube] 8UD50tPFCYo: Downloading video info webpage
[download] Mario Tennis does not pass filter channel_id = 'UCPcIwIn5WO6_o_vXF8SXx3w', skipping ..
You can also do the same thing with the videos that weren't downloaded due to copyright errors or some other shit. Just find them in the log, and write in ARCHIVE before reattempting to archive the channel.

Personally, I just kept an EXCLUDE file with all the videos I don't need and each time I needed to update a channel, before actually starting youtube-dl, I was switching to the directory and doing something like this:

jq -r '"youtube \(.id)"' */*.info.json | cat - EXCLUDE > ARCHIVE
This is how I used to do it before switching to Python.


 No.1045810>>1045847

>>1045750

Forgot to mention, YouTube channels have an RSS feed (https://www.youtube.com/feeds/videos.xml?channel_id=${CHANNEL}) so you can have, for example, a Cron job that checks the feed for new videos and archives them.


 No.1045847

>>1045810

>YouTube channels have an RSS feed

Not all of them have it activated, but most.


 No.1047077

>>1045750

I feel like you could get around the duplicate video issue by simply doing a run grabbing video URLs only, feeding them into a file, and then filtering out duplicates. From there, do a second run using the sorted file as input.

If there's no duplicates (ie, you're not searching multiple playlists and sources) it will run a little longer, but it will save you the hassle of re-downloading the webpage multiple times if there is a possibility of duplicates.


 No.1047079

>>1045220

/thread


 No.1055111

HAPAS ARE SUPERIOR TO WHITES


 No.1057519

Yeah, right, and the moon is made of cheese.




[Return][Go to top][Catalog][Screencap][Nerve Center][Cancer][Update] ( Scroll to new posts) ( Auto) 5
18 replies | 0 images | Page ?
[Post a Reply]
[ / / / / / / / / / / / / / ] [ dir / baphomet / caco / choroy / christ / dbv / dempart / gfl / leandro ][ watchlist ]