[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y ] [Search | Free Show | Home]

I'm doing some web scrapping in Node and came to this dilemma:

This is a blue board which means that it's for everybody (Safe For Work content only). If you see any adult content, please report it.

Thread replies: 18
Thread images: 2

File: nodejs-logo.png (14KB, 600x300px) Image search: [Google]
nodejs-logo.png
14KB, 600x300px
I'm doing some web scrapping in Node and came to this dilemma:

What happens if I scrap a website and that website contains self-executing functions? Do they get triggered by node? Also, what if I query the website with jquery, does it trigger the functions too?

Anyways, that seems like a pretty good anti-scrapping device.
>>
>>59256733
Use selenium.
>>
>>59256733
Depends on how you're scraping.

Cheerio runs client side code and with a promise you can scrape everything after execution.

Best anti scraping shit is closed API or change your classes and tags every few days,
>>
File: 1474551047583.jpg (265KB, 3508x2334px) Image search: [Google]
1474551047583.jpg
265KB, 3508x2334px
>>59256780
best anti scraping is silently feeding wrong data
>>
troll thread? troll thread.
>>
>>59256733
You cant possibly be this braindamaged
>>
>>59256733
>using Node to scrape a website
How many LOCs until callbacks and async start giving you nightmares?
>>
>>59256733
In order to execute the code you scrape from a website you would most likely have to use the eval-function (or dynamically require a downloaded .js-file). Both of those ideas are very stupid. You can safely scrape any website you want.

jQuery is a javascript library providing easier dom-access and some utility functions, it can be used along with node.js if you wish, but the are alternatives that are better suited towards web scraping.
>>
>>59256733
hahhahhaha
but on a serious note, use python for this shit
>>
>>59257808
Are you a time traveler from 2008?
>>
>>59256733
>What happens if I scrap a website and that website contains self-executing functions?
Short answer is no. Even with Jquery
>>
lol this is like writing system scripts in php
>>
>>59256780
>Best anti scraping shit is closed API or change your classes and tags every few days,
You can't really stop scraping. If a human can read it, a program can scrape it. If you don't want information to be accessed, don't put it on your website.
>>
>>59256779
I'm a noob when it comes to coding but isn't selenium a little slow?
>>
>>59257808
PhantomJS is the best, but it's not Node
>>
>>59256733
Usually the JS of the scrapped pages is executed in its own, isolated context. If theere was an exploit in Node/libraries, it could access the program data, but that's another history.

>do functions get triggered by node
It depends. Most libraries allow you to choose whether JS is allowed or not.
>>
>>59256733
Just screenshot the site and OCR it
>>
>>59256733
if JS is loading the content then just bypass JS and go directly to the content. JS is executed within the browser/client so it's not something the server can hide.
Thread posts: 18
Thread images: 2


[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y] [Search | Top | Home]

I'm aware that Imgur.com will stop allowing adult images since 15th of May. I'm taking actions to backup as much data as possible.
Read more on this topic here - https://archived.moe/talk/thread/1694/


If you need a post removed click on it's [Report] button and follow the instruction.
DMCA Content Takedown via dmca.com
All images are hosted on imgur.com.
If you like this website please support us by donating with Bitcoins at 16mKtbZiwW52BLkibtCr8jUg2KVUMTxVQ5
All trademarks and copyrights on this page are owned by their respective parties.
Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
This is a 4chan archive - all of the content originated from that site.
This means that RandomArchive shows their content, archived.
If you need information for a Poster - contact them.