[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y ] [Search | Free Show | Home]

Tech Pony - Assistance Requested

This is a blue board which means that it's for everybody (Safe For Work content only). If you see any adult content, please report it.

Thread replies: 14
Thread images: 8

Hey guys, I need some assistance.

Do any of you ponyfags (or whatever we call ourselves these days) have any experience with parsing wikitext?

I'm trying to parse a wikipedia article. [pic related] Originally I was trying to do this by using JS to parse the html on the page, but this was proving a bit complicated to do in an effective manner. (a couple of little structural problems)

I was thinking the most effective way to accomplish the task would be to parse the wikitext. But until yesterday, I had never even heard of wikitext.

I don't know how to parse it, or use the api. I've tried reading the documentation, but I don't understand most of it.

Do any of you guys know how to do this?
>>
>>30626564

This is the wikitext.

https://en.wikipedia.org/w/api.php?action=query&titles=My%20Little%20Pony:%20Friendship%20Is%20Magic%20(season%207)&prop=revisions&rvprop=content&format=json
>>
>>30626564
>>>/g/
>>
>>30626571
Can't post pony-related shit on /g/.

>>30626564
This is what I was trying to use: https://gist.github.com/anonymous/d087f24a911a2729e1c874c679ced0d3#file-wip-wikiparserclasses-js

But as I said, it was a bit complicated. Between the citations in the table-cells, and the structure of some of the nodes, and the fact that some cells contain data like "Story by: name & name2\nTeleplay by: name3 and name4" which I could write a parser for, but it wouldn't be very clean and dynamic

So i'm thinking parsing the wikitext would be easier
>>
>>30626564
Does the API help?
https://www.mediawiki.org/wiki/API:Main_page
>>
>>30627027
Actually, yes. I seem to be understanding it better today.

I've learned quite a lot.

I've learned about templates, transclusions, and includes.

I've learned how the linking system works.

I've learned that I can basically query the main page "List of My Little Pony: Friendship Is Magic episodes", get the wikitext, match each line of format "{{:My Little Pony: Friendship Is Magic (season [0-9])}}", which gives us each seasonPage title. I encode and query the page (by its title) to get the wikitext for that page. (which includes the episode table as defined at: https://en.wikipedia.org/wiki/Template:Episode_list/sublist )

From there, I can use some js to match and parse only the relevant wikitext, to begin extracting the data.
>>
>>30627027
>>30627858
I've been taking notes on what i'm learning

Hopefully the official mlp wikia uses the same system, and will have the same api

I intend to eventually make a script to dynamically acquire information about every single mlp episode, including the basic episode meta information, description & title, some thumbnails (4 or 5 per episode), link to the video url, and etc

Basically a "Complete MLP Episode Index - Archive Snapshot System".
>>
>>30627858
>I've learned
>I've learned
>I've learned
>I've learned
all your post is missing is "Dear Princess Celestia" and "Your faithful student"
>>
>>30627890
Sorry, I'm just glad I've actually learned something for once.

I only really get the opportunity to learn new stuff like once every 2 or 3 months. Basically when I can get my hands on some drugs, and a free weekend to use them.

The rest of the time its just work (Mon - Fri) or Exercise (Sat, Sun).

I don't really get the opportunity to sit down and write code or study things anymore, because there's never any time, and because I have to be clean most of the time.

Still. Today and Yesterday were kind of nice. I actually felt like a student again. I miss that.
>>
>>30627858
>>30627880
Wikia does a few things different than Mediawiki (e.g. the syntax)
http://www.wikia.com/api/v1/
>>
File: 1460934318461.jpg (114KB, 640x640px) Image search: [Google]
1460934318461.jpg
114KB, 640x640px
>>30627971
they pay people to write code you know. change your job to one where you can learn things. even a Java code monkey has a lot of opportunities to learn
>>
>>30628032
I'm trying to get a code job. But for now, I've been unable to get one. Currently working as a support engineer. (people call in; I help them fix their appliance)

I finally found a small, elegant way to extract the wikitext for all mlp season-pages.

If you're on the page: https://en.wikipedia.org/wiki/List_of_My_Little_Pony:_Friendship_Is_Magic_episodes

And you open up the console and run:
https://gist.github.com/anonymous/247652a9e114aebd154d08d19946a22a#file-pullthatdata-js

It will extract the data, and show it in a new tab, like so: [pic related]

>>30627996
Thanks for the info. I'll probably begin on that one very soon, just finishing up a couple last-minute things
>>
>>30627996
>http://www.wikia.com/api/v1/
This is going to be so much easier to parse.
>>
People usually are more willing to help if you specifically say what your end goal is, not your current method/roadblock. Why do you want this wikitext?
Thread posts: 14
Thread images: 8


[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y] [Search | Top | Home]

I'm aware that Imgur.com will stop allowing adult images since 15th of May. I'm taking actions to backup as much data as possible.
Read more on this topic here - https://archived.moe/talk/thread/1694/


If you need a post removed click on it's [Report] button and follow the instruction.
DMCA Content Takedown via dmca.com
All images are hosted on imgur.com.
If you like this website please support us by donating with Bitcoins at 16mKtbZiwW52BLkibtCr8jUg2KVUMTxVQ5
All trademarks and copyrights on this page are owned by their respective parties.
Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
This is a 4chan archive - all of the content originated from that site.
This means that RandomArchive shows their content, archived.
If you need information for a Poster - contact them.