[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y ] [Search | Free Show | Home]

Let's discuss statistical parsing /g/ Ultimately, my goal

This is a blue board which means that it's for everybody (Safe For Work content only). If you see any adult content, please report it.

Thread replies: 3
Thread images: 2

Let's discuss statistical parsing /g/

Ultimately, my goal is to be able to distinguish the different parts of any given signal / data structure ... whatever.

I don't want to know what it means, I just want to know there are distinct symbols or patterns. but I also want to be able to capture "containers"
or things that wrap around discrete chunks in the signal.
Let me give you 2 instances:
>Natural language:
John told Alex that Christine said, "Hey, this is a shitty example sentence"
>Some program
function(param, arg2){return param - arg2;}

In the natural language example, the wrapping symbols would be the pauses(vocal)/quotes(written) that wrap and cluster the quoted words into 1 item.
Just like the function has 2 sets of symbols to "wrap" and contain other information.

Again, I don't care about "meaning", just distinction.
I believe this is possible. I believe there are common behaviors of "wrappers" and key symbols. key symbols I think can be found with some type of statistcal approach. where you count the occurance of certain combinations and then use a score to determine if it's worth the risk to be considered a keyword or whatever.
I don't know how you would do this for wrappers, unless they can literally be deducced by everything that isn't a "key symbol".

I'm not a very disciplined programmer so a lot of my theories are fuzzy.
I was thinking that once key relationships are found (and it's likely that there arren't any more to be found) then you can cluster all the items in the 1 dimensional signal
>>
File: pooinloo.png (1MB, 880x759px) Image search: [Google]
pooinloo.png
1MB, 880x759px
>>62096898
>Sometimes you have to stop thinking so much and just go to the designated shitting street
>>
My heart tells me to find a few friends among Muslim migrants, and go on a purge of faggots and trans degenerates. I must pursue my heart.

Also Python is the greatest language of all.
Thread posts: 3
Thread images: 2


[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y] [Search | Top | Home]

I'm aware that Imgur.com will stop allowing adult images since 15th of May. I'm taking actions to backup as much data as possible.
Read more on this topic here - https://archived.moe/talk/thread/1694/


If you need a post removed click on it's [Report] button and follow the instruction.
DMCA Content Takedown via dmca.com
All images are hosted on imgur.com.
If you like this website please support us by donating with Bitcoins at 16mKtbZiwW52BLkibtCr8jUg2KVUMTxVQ5
All trademarks and copyrights on this page are owned by their respective parties.
Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
This is a 4chan archive - all of the content originated from that site.
This means that RandomArchive shows their content, archived.
If you need information for a Poster - contact them.