[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y ] [Search | Free Show | Home]

That moment when you realise only 2 lines of code prevent everyone

This is a blue board which means that it's for everybody (Safe For Work content only). If you see any adult content, please report it.

Thread replies: 12
Thread images: 2

File: GoogleTotallyChecksThis.png (1KB, 124x75px) Image search: [Google]
GoogleTotallyChecksThis.png
1KB, 124x75px
That moment when you realise only 2 lines of code prevent everyone from snooping your site.
Do you honestly think that tech giants like google actually respect robots?
>>
the only reason id ever respect a faggy robots.txt is if im releasing something public on github and even then if your site doesn't have a public api in 2017, to get the information i need im going to crawl it with different common user agents and varying proxies until you stop being a fucking idiot
>>
>>59378219
I don't have robots.txt, only X-Robots

robots.txt is actually deprecated. If you have it, Google will still index your page, but replacing all the pages with "This page can't be displayed due to robots.txt" etc. which completely defeats the point.

It's good to not have robots.txt
>>
>>59378280
You can see this with chiru.no. you can't google it and it has no robots.txt
>>
File: frog.png (378KB, 600x760px) Image search: [Google]
frog.png
378KB, 600x760px
>>59378219
>write website entirely in javascript
>crawlers can't index anything because they dont support running js
>as an added bonus, freetards also won't visit your website
>>
>>59378336
>He still thinks crawlers can't index javascript-based website :^)
>>
>>59378336
m8, crawler generates a websites loaded state in its VM or something and then interacts with it.
>>
>>59378336
Javascript is rendered before Google archives it. Otherwise everything would say "Enable Javascript" in the description. It's a lot of bullshit to do somethig incorrectly.

Also, the page still gets indexed even if the description won't load.
>>
>>59378219
what did he mean when he said this?
>>
>>59378219
The primary purpose of robots.txt is to inform crawling bots not to index something that's bad to index, like if a page is generated automatically and thus pointless to index, or if the page links back to itself using randomly generated URLs so the bot doesn't get stuck in an infinite loop.

robots.txt is a tool for websites to help crawling bots, not hurt them or block them. Crawling bots are not obligated to respect it.

If you're trying to block crawlers, keep in mind that there is literally no way for you to detect a crawler that perfectly mimics your regular users. It's like bailing out a flooding aircraft carrier with a teaspoon.
>>
>>59378446
Don't assume my gender ever again, faggot nigger.
>>
>>59378567
Don't assume my race and sexuality. I'm a non binary gender, pansexual jewish male lesbian
Thread posts: 12
Thread images: 2


[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y] [Search | Top | Home]

I'm aware that Imgur.com will stop allowing adult images since 15th of May. I'm taking actions to backup as much data as possible.
Read more on this topic here - https://archived.moe/talk/thread/1694/


If you need a post removed click on it's [Report] button and follow the instruction.
DMCA Content Takedown via dmca.com
All images are hosted on imgur.com.
If you like this website please support us by donating with Bitcoins at 16mKtbZiwW52BLkibtCr8jUg2KVUMTxVQ5
All trademarks and copyrights on this page are owned by their respective parties.
Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
This is a 4chan archive - all of the content originated from that site.
This means that RandomArchive shows their content, archived.
If you need information for a Poster - contact them.