Algorithms

Thread replies: 10
Thread images: 1

Anonymous
Algorithms 2016-12-23 12:22:36 Post No. 8558387
[Report] Image search: [Google]

File: randomWalk.png (187KB, 784x576px) Image search: [Google]

Algorithms Anonymous 2016-12-23 12:22:36 Post No. 8558387 [Report]

Get in here if you like algorithms. Basic discussion and any thoughts about algortihm design, implementation, problems etc. are all welcome.

I'll start with a question to get you in the mood.

Let's suppose there is a city map and I set a random starting point somewhere on it.
Is it possible to design a path structure (such as go left then right then straight for 2 blocks and left again) that will ensure that I am getting a simple random sample (equal selection probability / inclusion probability > 0 for all i in n) out of all city households?
If so, how would the algorithm look like?

Anonymous 2016-12-23 12:38:51 Post No.8558396
[Report]

Anonymous 2016-12-23 12:38:51 Post No.8558396 [Report]

>>8558387
Easy.
Works on both regular grids and arbitrary graphs:

1. Let n = number of possible nodes to move to next.
2. Roll 1 n-sided die
3. Move one step in the direction suggested by 2.
4. Go to 1.

Anonymous 2016-12-23 12:57:14 Post No.8558413
[Report]

Anonymous 2016-12-23 12:57:14 Post No.8558413 [Report]

>>8558396
That is a correct solution. Exactly what I had in mind.

Let's modify the task a little to make things harder. Realistically spoken, there is more than one person in each household. Suppose you want to generate a random sample of the people living in the city, not a random sample of the households. There might be different amounts of people in said houses.

How do you proceed now?

Anonymous 2016-12-23 01:13:10 Post No.8558434
[Report]

Anonymous 2016-12-23 01:13:10 Post No.8558434 [Report]

>>8558413
Naive answer but often good enough in practice: weigh the sides of the die in proportion to the number of people in each household.

Also, I realized that I was technically incorrect in saying that the method works for arbitrary graphs, since the graph structure would cause the probability of reaching an isolated node (say) to be lower than that of an equivalent centralized node.
And we would probably have to require at the very least that the graph be connected.

One possible approach to this might be to draw up the incidence matrix M of all nodes in the graph, let p = (p1,...,pN) be the desired probability distribution of node hits, replace each of the '1' entries in N with arbitrary variables, and solve the characteristic polynomial Np = p <-> |N - I| = 0 for any set of non-negative weights to use at each junction, such that the long-run sample probabilities equal the Perron-Frobenius root p.

That said, I'm not sure if the coefficients can always be found for connected graphs (it clearly cannot for unconnected graphs), or if finding such coefficients could be done in a more tractable way.

Also, it's late where I am and I'm going off to sleep now. Good luck with your thread and hopefully it generates some interesting discussion before I check back.

Anonymous 2016-12-23 01:14:36 Post No.8558437
[Report]

Anonymous 2016-12-23 01:14:36 Post No.8558437 [Report]

>>8558413
>Suppose you want to generate a random sample of the people living in the city, not a random sample of the households. There might be different amounts of people in said houses.
>How do you proceed now?

If you accept >>8558396's solution as a valid solution for the previous problem then it is also a valid solution for this problem.

Why? Because the people living in a house is completely random. You cannot see a house and from the size or color of the house predict who lives in there. It is all random. You cannot really predict it.

Anonymous 2016-12-23 01:28:07 Post No.8558450
[Report]

Anonymous 2016-12-23 01:28:07 Post No.8558450 [Report]

>>8558437
If you're taking the assumption that there are no systematical geographic variations in said population, you would be right.

It's not random though, practically spoken. Lower income houses, which are concentrated on some parts of a city (our graph), would have bigger amounts of people living in one household for example. By taking a random household, you would put a higher probability on lower income people, generating bias in your sample.

That being said, you and the other guy might be right about the initial solution being wrong. But if it is indeed wrong, we'd still need an algorithm for that one before we get to the next step.

Anonymous 2016-12-23 03:59:31 Post No.8558605
[Report]

Anonymous 2016-12-23 03:59:31 Post No.8558605 [Report]

No one else got any ideas? Thought there was a ton of computer science / math majors on here.

Anonymous 2016-12-23 04:09:55 Post No.8558615
[Report]

Anonymous 2016-12-23 04:09:55 Post No.8558615 [Report]

>>8558605
>No one else got any ideas? Thought there was a ton of computer science / math majors on here.

Well, the problem with the past solution is that the starting position would cause a bias in the selection, as households closer to the starting point have a significantly higher chance of getting visited.

The real solution would be to count all the nodes, call that N.

Then assign a number from 1 to N to each household, completely randomly. Like first label the houses from 1 to N as you count them and then "shuffle" that order like you would shuffle a deck of cards.

Then to pick the first house roll an N sided dice, get a number k and then go to the house numbered k. Then remove k from the "deck" and pick a new k.

Repeat as many times as you want to get your random sampling.

This is the only way to ensure that we do not fall in the closeness bias I presented before but it ultimately makes the starting point basically pointless and maybe that's the only way because having the starting point is what causes problems so it seems like I'm just patching up the issue.

Anonymous 2016-12-23 05:44:11 Post No.8558736
[Report]

Anonymous 2016-12-23 05:44:11 Post No.8558736 [Report]

>>8558615
>Well, the problem with the past solution is that the starting position would cause a bias in the selection, as households closer to the starting point have a significantly higher chance of getting visited.

That's not a problem cause the starting point is random. If you would run the algorithm from every possible starting point once and it reaches every household the same amount of times, you can use either of the the starting points + the algorithm as equal probability sample.

That being said, there are different ways of choosing the households. You could take every single household down the algorithms way or just every second or third etc. Normally you would also want to not include the same household twice.

This is a not so well known problem in survey methodology. People just assumed that some arbitrary routes that they took from top of their heads would lead to equal probability samples. It's been shown that it doesn't (I could search for the articles again if someones interested enough). There are not a lot of computer scientists / actual mathematicians in that field though, so I am trying to give it a try here.

I got a different idea now (that's losely based on yours).

What if I numbered all households 1 to N, took out a random starting point k out of those, and a second random "end" point j. Now I use a different algorithm to find all routes from k to j and pick a random one of those. The interviewer will be set on that journey, sampling all units on they way (or every second or third as described before). If the walk is not long enough, do the procedure again.

Am I stupid as hell or would that actually produce a simple random sample?

AcbnPhone 2016-12-23 06:42:59 Post No.8558821
[Report]

AcbnPhone 2016-12-23 06:42:59 Post No.8558821 [Report]

Algorithms SUCK, kid

I'm aware that Imgur.com will stop allowing adult images since 15th of May. I'm taking actions to backup as much data as possible. Read more on this topic here - https://archived.moe/talk/thread/1694/

I'm aware that Imgur.com will stop allowing adult images since 15th of May. I'm taking actions to backup as much data as possible.
Read more on this topic here - https://archived.moe/talk/thread/1694/