I'm doing some web scrapping in Node and came to this dilemma:
What happens if I scrap a website and that website contains self-executing functions? Do they get triggered by node? Also, what if I query the website with jquery, does it trigger the functions too?
Anyways, that seems like a pretty good anti-scrapping device.
>>59256733
Use selenium.
>>59256733
Depends on how you're scraping.
Cheerio runs client side code and with a promise you can scrape everything after execution.
Best anti scraping shit is closed API or change your classes and tags every few days,
>>59256780
best anti scraping is silently feeding wrong data
troll thread? troll thread.
>>59256733
You cant possibly be this braindamaged
>>59256733
>using Node to scrape a website
How many LOCs until callbacks and async start giving you nightmares?
>>59256733
In order to execute the code you scrape from a website you would most likely have to use the eval-function (or dynamically require a downloaded .js-file). Both of those ideas are very stupid. You can safely scrape any website you want.
jQuery is a javascript library providing easier dom-access and some utility functions, it can be used along with node.js if you wish, but the are alternatives that are better suited towards web scraping.
>>59256733
hahhahhaha
but on a serious note, use python for this shit
>>59257808
Are you a time traveler from 2008?
>>59256733
>What happens if I scrap a website and that website contains self-executing functions?
Short answer is no. Even with Jquery
lol this is like writing system scripts in php
>>59256780
>Best anti scraping shit is closed API or change your classes and tags every few days,
You can't really stop scraping. If a human can read it, a program can scrape it. If you don't want information to be accessed, don't put it on your website.
>>59256779
I'm a noob when it comes to coding but isn't selenium a little slow?
>>59257808
PhantomJS is the best, but it's not Node
>>59256733
Usually the JS of the scrapped pages is executed in its own, isolated context. If theere was an exploit in Node/libraries, it could access the program data, but that's another history.
>do functions get triggered by node
It depends. Most libraries allow you to choose whether JS is allowed or not.
>>59256733
Just screenshot the site and OCR it
>>59256733
if JS is loading the content then just bypass JS and go directly to the content. JS is executed within the browser/client so it's not something the server can hide.