Hey guys, I need some assistance.
Do any of you ponyfags (or whatever we call ourselves these days) have any experience with parsing wikitext?
I'm trying to parse a wikipedia article. [pic related] Originally I was trying to do this by using JS to parse the html on the page, but this was proving a bit complicated to do in an effective manner. (a couple of little structural problems)
I was thinking the most effective way to accomplish the task would be to parse the wikitext. But until yesterday, I had never even heard of wikitext.
I don't know how to parse it, or use the api. I've tried reading the documentation, but I don't understand most of it.
Do any of you guys know how to do this?
>>30626564
This is the wikitext.
https://en.wikipedia.org/w/api.php?action=query&titles=My%20Little%20Pony:%20Friendship%20Is%20Magic%20(season%207)&prop=revisions&rvprop=content&format=json
>>30626571
Can't post pony-related shit on /g/.
>>30626564
This is what I was trying to use: https://gist.github.com/anonymous/d087f24a911a2729e1c874c679ced0d3#file-wip-wikiparserclasses-js
But as I said, it was a bit complicated. Between the citations in the table-cells, and the structure of some of the nodes, and the fact that some cells contain data like "Story by: name & name2\nTeleplay by: name3 and name4" which I could write a parser for, but it wouldn't be very clean and dynamic
So i'm thinking parsing the wikitext would be easier
>>30626564
Does the API help?
https://www.mediawiki.org/wiki/API:Main_page
>>30627027
Actually, yes. I seem to be understanding it better today.
I've learned quite a lot.
I've learned about templates, transclusions, and includes.
I've learned how the linking system works.
I've learned that I can basically query the main page "List of My Little Pony: Friendship Is Magic episodes", get the wikitext, match each line of format "{{:My Little Pony: Friendship Is Magic (season [0-9])}}", which gives us each seasonPage title. I encode and query the page (by its title) to get the wikitext for that page. (which includes the episode table as defined at: https://en.wikipedia.org/wiki/Template:Episode_list/sublist )
From there, I can use some js to match and parse only the relevant wikitext, to begin extracting the data.
>>30627027
>>30627858
I've been taking notes on what i'm learning
Hopefully the official mlp wikia uses the same system, and will have the same api
I intend to eventually make a script to dynamically acquire information about every single mlp episode, including the basic episode meta information, description & title, some thumbnails (4 or 5 per episode), link to the video url, and etc
Basically a "Complete MLP Episode Index - Archive Snapshot System".
>>30627858
>I've learned
>I've learned
>I've learned
>I've learned
all your post is missing is "Dear Princess Celestia" and "Your faithful student"
>>30627890
Sorry, I'm just glad I've actually learned something for once.
I only really get the opportunity to learn new stuff like once every 2 or 3 months. Basically when I can get my hands on some drugs, and a free weekend to use them.
The rest of the time its just work (Mon - Fri) or Exercise (Sat, Sun).
I don't really get the opportunity to sit down and write code or study things anymore, because there's never any time, and because I have to be clean most of the time.
Still. Today and Yesterday were kind of nice. I actually felt like a student again. I miss that.
>>30627858
>>30627880
Wikia does a few things different than Mediawiki (e.g. the syntax)
http://www.wikia.com/api/v1/
>>30627971
they pay people to write code you know. change your job to one where you can learn things. even a Java code monkey has a lot of opportunities to learn
>>30628032
I'm trying to get a code job. But for now, I've been unable to get one. Currently working as a support engineer. (people call in; I help them fix their appliance)
I finally found a small, elegant way to extract the wikitext for all mlp season-pages.
If you're on the page: https://en.wikipedia.org/wiki/List_of_My_Little_Pony:_Friendship_Is_Magic_episodes
And you open up the console and run:
https://gist.github.com/anonymous/247652a9e114aebd154d08d19946a22a#file-pullthatdata-js
It will extract the data, and show it in a new tab, like so: [pic related]
>>30627996
Thanks for the info. I'll probably begin on that one very soon, just finishing up a couple last-minute things
>>30627996
>http://www.wikia.com/api/v1/
This is going to be so much easier to parse.
People usually are more willing to help if you specifically say what your end goal is, not your current method/roadblock. Why do you want this wikitext?