[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y ] [Search | Free Show | Home]

Hello /sci/. I'm making software to analyze boards and

This is a blue board which means that it's for everybody (Safe For Work content only). If you see any adult content, please report it.

Thread replies: 19
Thread images: 2

File: logo.png (7KB, 458x446px) Image search: [Google]
logo.png
7KB, 458x446px
Hello /sci/.

I'm making software to analyze boards and threads on 4chan.
How do I measure intelligence of a board? I was thinking of something including image ratio, length of posts and sentimental objectivity.
Also, what other metrics should I consider?
>>
>>8498102
The only thing considered a measure for intelligence is the intelligence quotient, there are tests for that. If you want to determine the intelligence of a poster by their posts, you need labelled data to train a regressor. You don't have that. Everything else is just guesswork.

Anyway, if you want to look at interesting quantities by themselves you may look at features such as:
>Number of different words per thread
>Length of posts
>Number of misspelled words
>Number of unique posts
>Number of original posters per thread
>Number of "meme words"
>Number of images posted
>Number of unique images (images that haven't been posted so far)

It would be interesting to do the following by the way:
Look at only the OP and train a classifier and judge whether the post will reach its bump limit (or a regressor and determine how many replies the thread will have.
>>
>>8498115
Yeah, I already got half of those you said.
A really cool one would be something along the lines of keywords, words that are not on the list of 10,000 of the most popular English ones, but the most popular on the board. I wonder what the outcome would be for different boards. I also want to include plotting graphs. And I don't want to do anything with machine learning just yet.
>>
>>8498102

You should include a function to quantify the dankness of memes posted.
>>
>>8498102
>>8498115
Punctuation and length of words too
>>
File: tumblr_nblvjp9NDF1qgp2eyo7_1280.jpg (478KB, 1280x1920px) Image search: [Google]
tumblr_nblvjp9NDF1qgp2eyo7_1280.jpg
478KB, 1280x1920px
>>8498102
You can try making a browser extension which tracks all the posts one makes and which includes an IQ test. After some time you'll have enough labeled data to train a regressor.
>>
>>8498459
Also OP has too much free time if he considers doing this
>>
>>8498462
I could see myself doing this.
I have nothing else to work on.
>>
>>8498115
Count ad hominen as well.
>>
>>8498102
You should also consider reply rate, and secondly take all the posts on the board and break each one up by the words in the post. See which words are commonly used there, the rate we are using unique words at (basically see how much a board is shitposting and meme spouting), and tag certain words as flags or indicators of low IQ. I suspect words with overly frequent posts like "redpill me on X" or buzzwords like "sjw", "cuck" or "autist" have lower IQ due to the fact that a good amount of posters are quite frankly incapable of expressing their opinion or engaging in proper debate. It also might be interesting to check reply chains to see how often discussions either contain or end in short single sentence posts with these buzzwords.
>>
How are you gonna get the data?
>>
>>8498658
Cuck.
>>
>>8498884
Maybe he's going to analyze the HTML code or something.
>>
>>8498602
Good luck identifying those.

Number of swear words is another one worth looking at.
>>
>>8498884
>refresh the page
>>
>>8498964
>>8498962
>>8498884

https://github.com/4chan/4chan-API

I'm downloading every board's json data as we speak.
The process takes 2-3 hours if you apply to the API's rules.
>>
>>8498982
>>8498962
I am no expert of course, but couldn't he use a python code? Stuff like scrapy dot org offer web scraping relatively easily.
>>
>>8498250

I know another dude from a different imageboard (krautchan.net/int) who wrote software that kep records of everything and we could see from what country the most used words was.

I will see if I can get to him. Been a few years since I last talked to him.
>>
>>8498884
Python has many advanced libraries for that.
Thread posts: 19
Thread images: 2


[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y] [Search | Top | Home]

I'm aware that Imgur.com will stop allowing adult images since 15th of May. I'm taking actions to backup as much data as possible.
Read more on this topic here - https://archived.moe/talk/thread/1694/


If you need a post removed click on it's [Report] button and follow the instruction.
DMCA Content Takedown via dmca.com
All images are hosted on imgur.com.
If you like this website please support us by donating with Bitcoins at 16mKtbZiwW52BLkibtCr8jUg2KVUMTxVQ5
All trademarks and copyrights on this page are owned by their respective parties.
Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
This is a 4chan archive - all of the content originated from that site.
This means that RandomArchive shows their content, archived.
If you need information for a Poster - contact them.