[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y ] [Search | Free Show | Home]

identifying posters by their writing style

This is a blue board which means that it's for everybody (Safe For Work content only). If you see any adult content, please report it.

Thread replies: 48
Thread images: 3

File: DCqRW9cVwAESj_m.jpg large.jpg (18KB, 360x313px) Image search: [Google]
DCqRW9cVwAESj_m.jpg large.jpg
18KB, 360x313px
idk what's the most appropriate board where to ask this but maybe some of you know something

do you think that it should be possible from a program to tell apart different posters, given that they write a long enough message, basing on the characteristics of their text?

for example words,
using synonyms less usual than others,
the way, among all the possible legal ones, to construct a discourse
capitalizing words or not
basically to create a set of poster "models" and to give the estimate probability that x post belongs to y poster

for example, if i write another long post in the thread, the analyzing program should say that there ie 89% of chance that the post belongs to me

this could be especially useful in generals, especially generals with shitposters and annoying users

sorry if i gave a messy explanation but i hope you understood what i mean
>>
sure it's possible
but not very useful
>>
>>62197728
It is certainly possible to a degree, you can somewhat form people into categories so long as the person isn't being aware of how they type up their posts.

Like you, for example, would go into my fucking retard category.
>>
>>62197728
on this board at least, there are about 10 users that i recognize in almost every thread, just based on their style of writing. it really differs between people, and that becomes very apparent when you spend 16 hours a day reading the shit they post. so im pretty sure that itd be possible for a program too.
>>
DELET
>>
It is indeed possible to identify a poster by the way they write their posts, capitalization, oxford commas, et cetera. but you're kinda fucked if i suddenly talk like a faggot
>>
Yes which is why I change my mannerisms between posts and always use random timestamps for my filenames
>>
>>62197747
>Like you, for example, would go into my fucking retard category.
may i ask you why?
>>
>>62197745
>but not very useful
i would be the next level of filtering shitposters
>>
>>62197877
it*
>>
>>62197770
Do you recognize me?
>>
File: op1.png (283KB, 573x575px) Image search: [Google]
op1.png
283KB, 573x575px
>>62197728
>>
>>62197728
>do you think that it should be possible from a program to tell apart different posters
On slow boards with few posters, such as /g/, you don't even need a program to do this.
>>
>>62197728
OK. For example, you look like a redditor.
>>
>>62197863
First, your image.
Second, your sentence structure is worse than teen ESLs I know.
Third, your lack of capitalization and your inability to punctuate in a reasonable fashion.
>>
>>62197747
reddit kys
>>
i could make a program for identifying based pussy posters
>>
There are already algorithms for that. That's also why I will never be able to publish my erotic fanfiction ;_;
>>
>>62198000
First, i took a random picture since it's not really important and there is nothing really representative of it
Second, I wrote hastily, and I wonder how many foreign languages you speak. I wouldn't be surprised if you happen to be another arrogant anglo who is too retard to speak anything else
Third, this is fucking 4chan, learn2speechregister before tipping your fedora, mr. supreme gentleman
>>
pretty much impossible since the maximum possible degree of variance in sentence structure is quite limited
>>
>>62197831
>suddenly
>if
>>
>>62198098
what about other languages that offer an higher degree of variation? like romance languages
>>
>>62198090
I picked*
>>
>>62198105
i dont know since i dont speak any. i cant imagine it would be THAT much different though.
>>
>>62197728
It is possible. It is really only useful with xbox huge datasets. Google/advertising companies and alphabet agencies use it to link profiles across different services.

4chan shitposts are not complex enough for things more complicated than a wordfilter to be useful. You need a minimum amount of complexity to get a high confidence result.
>>
>>62197728
4chan has hive mind mentality. all users write very similarly using memes and """culture""". so this is probably not possible.
>>
>>62197728
They can already do this, OP.

https://mobile.nytimes.com/blogs/bits/2012/01/03/software-helps-identify-anonymous-writers-or-helps-them-stay-that-way/?referer=
>>
>>62197986
Where did the 'reddit spacing' meme come from?
>inb4 reddit
>>
>>62198122
but i guess that annoying users in certain niche generals could be identified, since they don't just meme but they usually write long and toxic messages
you could also make the program analyze the archives
>>
>>62197728
Definitely possible. The secondary question is can we obtain training data with existing archives. Some boards used to have tagging systems. Does anyone have links 2 dumps from those times?
>>
>>62198145
do you know if there is any similar FOSS publicly available?
>>
>>62198154
You could, but I have only ever seen the process applied to huge datasets then outputting possible linked profiles with confidence intervals. Not sure how it would work as a filter.
>>
You'll need an trained or an expert system. But God dammit it's possible. I'm willing to collaborate to do this shit
>>
File: the path of explosions.jpg (262KB, 1920x1090px) Image search: [Google]
the path of explosions.jpg
262KB, 1920x1090px
>>62197770
;^)
>>
Of course, NSA and FBI have used algorithms in the past to detect typing patterns of criminals for example the FBI found an infamous pedophile online from how he greets with "hiya" in chat rooms. Once they have a lead they can document and use an algorithm to detect patterns between known typing and the potential suspects typing and it's shockingly as good as a fingerprint granted you get a good sample. IBM, Intel and a few other tech giants already have technology implemented.


P.S
I can already tell geographic region of everyone in this thread
>>
>>62198153
When you have several paragraphs and put spaces between them, that's the leddit style.
>>
>>62198413
>I can already tell geographic region of everyone in this thread
You are either a rusky, a pole, a pajeet or a bot.
>>
Stylometry
>>
>>62198420

Especially when it's actually sentences spaced out.

Like this and they do it thinking it looks better and easier to read.

When it's really just very retarded and inefficient use of page space.

And they will do it on every forum.
>>
>>62197747
I think it'd lose effectiveness because most /g/entlemen are smart enough to mix up their patterns when they're same-fagging, at least I am.
>>
>>62198154
for instance, this guy is me too.
>>
It is definitely possible.
The question is if we want that, and I'd say the answer is "No."

I couldn't care less about someone identifying the posts I have made on 4chan, but I don't want to have the knowledge regarding other posters forced on me.
I want to judge every post on its own merits, not based on the history of its creator.
>>
I doubt it's possible in a random thread, but I see it frequently on /pol/ because it also has flags so you can make an association between flag and writing style and you come up with some unique characters, for example:
Malaysian mike, Greek tranny poster, the argie that spams about the septuagint in Christian threads, etc
>>
>>62197728
https://psal.cs.drexel.edu/index.php/Main_Page

that's just the open sores javashit academic version
It's possible and it's adopted by internet cops worldwide. Be aware of it next time you post in that pedo forum, Mr. J. Gustavson.
>>
>>62197728

Depends.

As long as they write enough words for each post or they say enough unique strings / sequence of strings and you have enough posts, ya I'd say you could absolutely do that.
>>
>>62197728
I think you'll find this interesting.

https://medium.com/@amuse/how-the-nsa-caught-satoshi-nakamoto-868affcef595
>>
https://en.wikipedia.org/wiki/Stylometry
>>
>>62197728
why the ugly whore?
Thread posts: 48
Thread images: 3


[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y] [Search | Top | Home]

I'm aware that Imgur.com will stop allowing adult images since 15th of May. I'm taking actions to backup as much data as possible.
Read more on this topic here - https://archived.moe/talk/thread/1694/


If you need a post removed click on it's [Report] button and follow the instruction.
DMCA Content Takedown via dmca.com
All images are hosted on imgur.com.
If you like this website please support us by donating with Bitcoins at 16mKtbZiwW52BLkibtCr8jUg2KVUMTxVQ5
All trademarks and copyrights on this page are owned by their respective parties.
Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
This is a 4chan archive - all of the content originated from that site.
This means that RandomArchive shows their content, archived.
If you need information for a Poster - contact them.