[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y ] [Search | Free Show | Home]

Hello /sci/, I'm trying to estimate the probability of two

This is a blue board which means that it's for everybody (Safe For Work content only). If you see any adult content, please report it.

Thread replies: 15
Thread images: 4

File: chart.png (193KB, 800x600px) Image search: [Google]
chart.png
193KB, 800x600px
Hello /sci/,
I'm trying to estimate the probability of two statistically independent events. I know that in conditional probability
P(A|B) = P(A&B)/P(B)
but for statistically independent events:
P(A&B) = P(A)*P(B)
which means that
P(A|B) = P(A)

This results in charts like this one, where each of my categories ends up with the same probability for each other category. The lines only vary according to the real value of the number of members of that category.

Is there a better transformation I can do on my data, such that each of these lines is different, and so that each category produces a different probability?

I can try and explain myself more clearly if that doesn't make sense.

Thanks for your help.
>>
>>8065235
>Yvalues
>Xvalues
>>
File: chart.png (198KB, 800x600px) Image search: [Google]
chart.png
198KB, 800x600px
>>8065270
They are placeholder variable names. Is this better for you?
>>
File: chartkey.png (10KB, 300x240px) Image search: [Google]
chartkey.png
10KB, 300x240px
It occurs to me that these events actually aren't independent. If someone is employed in Agriculture, for instance, there is some chance that they earn band 2 per week, some chance they earn band 3 per week, etc. And this is dependent upon the industry of employment. I just don't know what this probability is, and that's what I want to determine. How do I work this out from what I have?
Pic is the earnings bands the numbers represent.
>>
>>8065284
Are you the guy who posted this type of data with income and age paired and job and age paired and wanted to know how to estimate the income and job pairings?
>>
>>8065295
Yeah, I'm that guy. Turns out computing the number of permutations of a matrix with ~5000000 entries in either axis is really, really computationally intensive. I'm trying to use probability to make it easier.
>>
>>8065284
Yes, obviously income and job are not independent. The chance of earning X given you have job Y is the amount of people who earn X and have job Y divided by the amount of people who have job Y. But you don't actually have this data do you?

>>8065302
Well there is an easier way to do it computation wise, which is to let the matrix take continuous values instead of just integers. Then you can use calculus to find the average value for each element. Unfortunately this involves solving for the hypervolume of a generalized polyhedron of a very large amount of dimensions (this is called a polytope). Basically, the range allowable for each element can be represented as a side of the polytope, and the hypervolume represents the probability of a particular value for the element, which allows you to calculate its expected value. Unfortunately this is probably way over your head and still too difficult to program.

Essentially you can't do what you're trying to do. Even if you could get the expected value of each element, this is just the average value it would take if all permutations were equally likely. But we know not all permutations of a job and an income are equally likely.
>>
File: Industry.png (37KB, 800x600px) Image search: [Google]
Industry.png
37KB, 800x600px
>>8065318
Ah, that second paragraph is a good explanation of what I'm trying to do, mathematically. Thanks for it.

No, you're right, I don't have the data concerning how many people earn X with job Y. That's what I'm trying to infer from what I've got.
So it's P(X|Y) = P(X&Y)/P(Y), then. But since I'm trying to infer P(X|Y) and I don't have P(X&Y), I was trying to use independence to assert that P(X&Y) == P(X)*P(Y), which isn't true.
So I can't work this out from what I have? Can I estimate it in any sensible way?

Side note, I was able to make some nice graphs from the data I did actually have.
>>
>>8065324
Yes, you can estimate it by the method discussed in the previous thread or this one, but this just uses the assumption that all possible permutations of the data are equally likely, which is not accurate. It's the best way of estimating without any more information, but it's not accurate. This won't give you a uniform distribution between incomes and jobs, as jobs with many people and incomes with many people will result in a higher estimated amount of people with both that job and that income. But that's all the information you have can tell you.
>>
>>8065350
Okay, the method proposed was computing the average value in each cell of the matrix for each possible permutation. However, this is computationally beyond the scope of my project. Is there a way I can determine the average value for each cell without computing each permutation? There was another method proposed where you started by assuming an even distribution, eg the values of the first row are all 1/n where n is the number of columns, but this only works if you have the same number of rows and columns, and I don't.
>>
>>8065354
Yes, I told you just now. Use calculus to find the expected value of the matrix. But that will be too hard also. Your data set is too big and uncorrelated to do what you want. I suggest you rethink entirely what you're trying to do.

>There was another method proposed where you started by assuming an even distribution, eg the values of the first row are all 1/n where n is the number of columns, but this only works if you have the same number of rows and columns, and I don't.
No, that method doesn't work at all. I thought I already told you that.
>>
>>8065362
Nah, I think you came up with that one first and then realised it wouldn't work, but you didn't explicitly state it. I considered it as well after I worked out that the other method wouldn't work for such a large dataset.

Yeah, it's not too much of a problem. I'll just have to say that I needed more information to achieve what I wanted to achieve. Thanks for all your help, anon.
>>
>>8065324
wtf does the numbers on the x-axis mean?
>>
You have data a_{ij} and b_{ik}, and
want to find probabilites
p_i q_j and r_k such that
p_i q_j = a_{ij} and
p_i r_k = b_{ik}. Take logs to get
linear constraints:
log(p_i) + log(q_j) = log(a_{ij})
log(p_i) + log(r_k) = log(b_{ik})
Now find the minimum of the quadratic
E^2 = \sum_i log(p_i)^2
+ \sum_j log(q_j)^2
+ \sum_k log(r_k)^2
subject to the constraints. This
will just require solving a system
of linear equations. This
will give you and estimate for
the unknowns p_i q_j and r_k.

A bit hacky, but might give you something that's reasonable and readily calculated.
>>
>>8066163
You could also give up the independence assumption and just try minimizing

[math]E^2=\sum_{ijk} p^2_{ijk}[/math]

subject to

[math]\sum_k p_{ijk}=a_{ij}[/math] and [math]\sum_j p_{ijk}=b_{ik}[/math] and [math]\sum_{ijk}p_{ijk}=1[/math].


Again, you will get a linear system. You might have to adjust a bit to get positivity.
Thread posts: 15
Thread images: 4


[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y] [Search | Top | Home]

I'm aware that Imgur.com will stop allowing adult images since 15th of May. I'm taking actions to backup as much data as possible.
Read more on this topic here - https://archived.moe/talk/thread/1694/


If you need a post removed click on it's [Report] button and follow the instruction.
DMCA Content Takedown via dmca.com
All images are hosted on imgur.com.
If you like this website please support us by donating with Bitcoins at 16mKtbZiwW52BLkibtCr8jUg2KVUMTxVQ5
All trademarks and copyrights on this page are owned by their respective parties.
Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
This is a 4chan archive - all of the content originated from that site.
This means that RandomArchive shows their content, archived.
If you need information for a Poster - contact them.