[ / / / / / / / / / / / / / ] [ dir / arepa / leftpol / monarchy / occult / tingles / u / vg / vichan ][Options][ watchlist ]

/tech/ - Technology

You can now write text to your AI-generated image at https://aiproto.com It is currently free to use for Proto members.
Email
Comment *
File
Select/drop/paste files here
Password (Randomized for file and post deletion; you may also set your own.)
* = required field[▶ Show post options & limits]
Confused? See the FAQ.
Expand all images

File (hide): cb3b81920ca783d⋯.jpg (157.46 KB, 500x750, 2:3, saintpeter.jpg) (h) (u)

[–]

 No.999782>>999926 >>1000120 [Watch Thread][Show All Posts]

Anyone ever scrape time-series data from twitter? I'm trying to work out a solution with Selenium, but it's going to require a lot of runtime.

 No.999848>>999926 >>1000124

direct twitter api https://developer.twitter.com/en/docs.html

or pick a language and use a real scraping library

https://scrapy.org/

https://jsoup.org/

http://www.nokogiri.org/

writing a program to go off and open a browser to do it is going to be the slowest way possible to do this.


 No.999926>>1000124

>>999782 (OP)

Depends on which kind of data you are interested in.

>>999848

API is ideal. Scraping tools are quicker than proxy-browser solutions like Selenium, but they can't execute JavaScript so might not always work.


 No.999930

>Anyone ever scrape time-series data from twitter? I'm trying to work out a solution with Selenium, but it's going to require a lot of runtime.

Kill me, Pete


 No.1000022>>1000124

Just look at their unofficial API through your browser's network activity monitor. It's pretty easy to understand, and you don't have to put up with the bullshit rate limiting that comes with the official api.


 No.1000120>>1000124

>>999782 (OP)

Just use the devtools protocol if you don't want to emulate a browser. Running chromium in headless mode is the most undetectable way to scrape data, and most likely to be successful as these websites expect an actual browser to execute javascript.


 No.1000124

>>999848

>>999926

Need to execute the js, unfortunately. Trying to avoid going the API route, because I need a lot more historical data than is available at my price point.

>>1000022

Gonna look into this, thanks.

>>1000120

Makes sense, thanks.


 No.1000172

kill me, pete

KILL ME NOW DO IT I BEG YOU




[Return][Go to top][Catalog][Screencap][Nerve Center][Cancer][Update] ( Scroll to new posts) ( Auto) 5
7 replies | 0 images | Page ???
[Post a Reply]
[ / / / / / / / / / / / / / ] [ dir / arepa / leftpol / monarchy / occult / tingles / u / vg / vichan ][ watchlist ]