[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y ] [Search | Free Show | Home]

How to write software that scrapes music off bandcamp and downloads

This is a blue board which means that it's for everybody (Safe For Work content only). If you see any adult content, please report it.

Thread replies: 22
Thread images: 2

File: 1500240574593.jpg (1MB, 1840x3264px) Image search: [Google]
1500240574593.jpg
1MB, 1840x3264px
How to write software that scrapes music off bandcamp and downloads it?

I don't wanna scrape anything off bandcamp, I just want to learn how do these things work.
>>
>>61608765
learn 2 regex

if you want example code look at flexget
>>
look at bandcamp source code. Find where the link to the music file is and wget that file.
>>
>>61608765
https://github.com/Otiel/BandcampDownloader
>>
>>61608765
such a sexy boy
>>
>>61608765
install gentoo
>>
Fiddle around the page to see how it works. Sometimes it can be as easy as recursive wget (ignore robots), others you might have to code some logic on http requests, and sometimes (depending on how much of a cunt the webdev is), you might have to emulate a web browser with something like phantomjs.

>>61608820
>Parsing html with regex
>>
>>61611776
what would you use instead of regex?
>>
>>61611018
This, wget is very powerful if you know how to use it.
>>
>>61611798
Well, how about a proper parser?

You can probably parse html with regex, chances are that your doing it wrong and working at least twice as much. I certainly wouldn't recommend it.
>>
>>61612037
>I certainly wouldn't recommend it.
so, how would you do it?
>>
>>61612037
>Well, how about a proper parser?
way to sidestep the question. how would you do it?
>>
>>61612176
Fetch the html and use vim to rip all relevant links which are forwarded to the shell script to download it.
>>
Check out BAS - Browser automation studio. Dunno if you can make it download music. But it is the easyest way to go when it comes to no coding skill and a need for web automation. And its completelt FREE.
>>
>>61608765
>Google is your friend
If you don't know shit about technology, why to you come to /g/?
>>
>>61612267
>a proper parser
>'just do it all manually! that is what I would do'
the whole point is to automate the process.
>>
>>61612545
Vim is not a ordinary text editor. You can run Vim macro inside a bash shell which will do the work for you.
>>
File: theponyhecomes.png (156KB, 740x695px) Image search: [Google]
theponyhecomes.png
156KB, 740x695px
>>61611798
>>61612037
>>61612176
With a proper XML parser and xpath expressions.
>>
read the source code of soundscrape and youll have a pretty good idea
>>
>>61612699
My favorite answer on the entire site
>>
I don't know the bandcamp website, but I build webscraper with python+beautifulsoup.
When I need javascript, I use python+selenium
>>
>>61612176
Not him, but XPath is meant to do that. You shouldn't try to parse HTML with regex
Thread posts: 22
Thread images: 2


[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y] [Search | Top | Home]

I'm aware that Imgur.com will stop allowing adult images since 15th of May. I'm taking actions to backup as much data as possible.
Read more on this topic here - https://archived.moe/talk/thread/1694/


If you need a post removed click on it's [Report] button and follow the instruction.
DMCA Content Takedown via dmca.com
All images are hosted on imgur.com.
If you like this website please support us by donating with Bitcoins at 16mKtbZiwW52BLkibtCr8jUg2KVUMTxVQ5
All trademarks and copyrights on this page are owned by their respective parties.
Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
This is a 4chan archive - all of the content originated from that site.
This means that RandomArchive shows their content, archived.
If you need information for a Poster - contact them.