[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y ] [Search | Free Show | Home]

Non Comparison Sorting Aglorithm

This is a blue board which means that it's for everybody (Safe For Work content only). If you see any adult content, please report it.

Thread replies: 76
Thread images: 7

I've been working on this for a while now and I'm very careful about the words I use to describe this technique I have discovered.

It's not just new, it's faster than anything I've tested and it's very simple to implement.

What is the best way to approach getting real input on this? I'm at a university studying Computer Science with a more than mild dose of imposter syndrome.

I plan on just writing a journal article discussing the algorithm, comparing it to others, and conducting a performance test. Then presenting the article to one of the faculty and hopefully they take me seriously.

I am very sure that this is at least a different approach to sorting an unordered list. It may have similarities to another sorting algorithm. It does not use comparisons
>>
I've found applications that describe central tendency and compression that I want to talk about. The algorithm produces something that I can use to describe the list other than the order.

I don't know how to mathematically formalize the algorithm.

I'm also an undergraduate with no intention of going to graduate school but I want to tell someone about what I've discovered. I would gladly just tell this board the procedure but I'm not sure if that's the best idea.
>>
File: IMG_4156.jpg (1MB, 2399x1920px) Image search: [Google]
IMG_4156.jpg
1MB, 2399x1920px
>>8405952
You sure you didn't just reinvent radix sort?
>>
Just go talk to one of your professors
>>
>>8405968
it's similar to radix sort but it's not focused on grouping

it's most similar to bucket sort in my opinion
>>
File: 1473804728095.png (138KB, 350x350px) Image search: [Google]
1473804728095.png
138KB, 350x350px
>>8405961
>I don't know how to mathematically formalize the algorithm.
>I'm also an undergraduate
>>
>OP reinvented sleepsort
snooze
>>
>I would gladly just tell this board the procedure but I'm not sure if that's the best idea.
Why not? I would be glad to review your ideas.
>>
>>8405998
>review
Sorry, fucking auto correct. I meant to say steal.
>>
>>8406021
The probability of there being anything to steal is very low. The limitations on sorting are well known.
>>
>>8405952
reddit: the post
>>
What are you hoping to achieve here OP? Either tell us what it is or don't make a thread about it.
>>
File: monkey.jpg (2MB, 2814x2109px) Image search: [Google]
monkey.jpg
2MB, 2814x2109px
Alright guys, I'll tell you what it is.

1. Place a banana at the end of a long track with n buckets.
2. Make n monkeys where the nth monkey's strength is proportional to the nth number.
2. Have the monkeys duke it out until the biggest one is closest to the banana, the next biggest one is next to that, etc.

Done, and in O(n) time. (One for each step of monkey fabrication (this can also be parallelized). And the monkeys are only so big, so they will get tired out after a constant amount of time.)

I call it ApeSort (not to be confused with "monkey sort" aka Bogosort).
>>
>>8406068
that's just sleepsort numbnuts
>>
>>8406068
>not to be confused with "monkey sort" aka Computer scientist sort
>>
Fine, here is the algo

>given a set, `a`, of unordered `n` digits
>SUM ( i -> n) { 2^(a_i) } = `key`
With this key, we can find a set of n integers in descending order

>1. While `key` is non zero,
>2. log_2(`key`) and take a floor. This is our a_i element
>3. `key` = `key` - log_2(a_i)

The `key` has other applications other than finding the sorted array of the original array. It can also be a measure of central tendency of the original array if we compare the nearest power of 2. It also has potential applications of compression since the `key` is shorter than the actual array.

Do I have something?
>>
>>8406146
>T(2n)
The efficiency of the algorithm is O(n) with a large memory consumption
>>
>>8406146
>Do I have something?

Yes, you have an honorable mention in my new PhD Thesis

>Thanks to a random 4chan poster for giving me support while I was inventing this new algorithm
>>
File: 281.jpg (28KB, 500x491px) Image search: [Google]
281.jpg
28KB, 500x491px
Neat
>>
>>8406146
for step 3, it's actually
>3. `key` = `key` - log_2(`key`)
or
>3. `key` = `key` - 2^(a_i)

I'm not entirely sure which method is better

There are limitations to the algorithm but they're not lengthy. I think the bare method works for all positive integers that can handle duplicates.

I think negative integers work if you use the first step 3 revision with - log_2(`key`)

It's simple and short. I tested it with mergeSort and on average it's 203% faster. I want to find an implementation that works with the TeraSort dataset but the `key` causes an overflow since even doubles don't have enough precision
>>
took it to my professor and he is helping to get me published

thanks for the free algorithm,

brainlet
>>
>>8406209
>I want to find an implementation that works with the TeraSort dataset but the `key` causes an overflow since even doubles don't have enough precision

What is the problem with doubles? Your algorithm only works with integers?
>>
I'm only asking for people's input. I'm going to continue looking at the algo but I need some confirmation that its even worth looking at. I've done a light literature review but came up with nothing similar other than bucket and counting sort.

I'm planning on publishing it but it would only be my second time and I don't know whether it's worth the time and money. My university doesn't really care since I shared it with a graduate student.

I have to approach this correctly because I have a history of mental illness that my university knows about and I don't know if I'm just crazy. Where I live, if you have a history of mental illness you can get jailed
>>
>>8406223
doubles is not big enough because with increasing array size you can't store the `key` in an integer or a double. In some cases, you are taking the 100000th power of 2.

I could probably do it in C or C++ with some good memory allocation but I might just store the `key` on the hard disk
>>
If you tested the Algorithm with the best case....
learn how to test pleb
>>
>>8406218
>>8406179
>>8406164

If your professors are interested, ask about its applications in quantum computing because you can theoretically store the `key` in a qubit that has a size of at least max(a) bits but I don't have working knowledge of quantum computing
>>
>>8406230
Create your own number type.

struct bignumber {
long long bits:
bool* number;
};

bignumber new_bignumber(int size) {
bignumber b;
b.bits = size;
number = new bool[size];
return b;
}

Something like that. The problem is that an n-bit big number will actually weigh n bytes and I have no idea how you would get into avoiding this problem. But it probably is not a problem.

You would also have to manually define arithmetic on it.
>>
>>8406146
Your loop is dependent on the key, put the key is not dependent on the length of the input, but it's maximal number. Here is what you are essentially doing (without floating point numbers):
1.Find maximum number M
2.create array of length M init to 0
3.go over unsorted list L and set M[L[i]] to 1
4.go over M and output all elements which are set to 1
>>
>>8406234
best case works but pedant academics are going to ask about worst case
>>
>>8406242
Well... thats the idea, you dont always haves the best case scenario
>>
>>8405961
>I would gladly just tell this board the procedure but I'm not sure if that's the best idea.
Then do it, or delete your shit thread.
>>
for x in input:
count[key(x)] += 1

# calculate the starting index for each key:
total = 0
for i in range(k): # i = 0, 1, ... k-1
oldCount = count[i]
count[i] = total
total += oldCount

# copy to output array, preserving order of inputs with equal keys:
for x in input:
output[count[key(x)]] = x
count[key(x)] += 1

return output


Its this op?
>>
>>8406241
you can perform linear calculations on the `key` version

>>8406236
the whole point is using a new integer `key` that can have O(1) operations instead of iterations. I understand the Boolean array and that's how I found it. It works with any arbitrary base but 2 is the most stable
>>
>>8406249
so this is why I am having a little bit of trouble with the algorithm because of its similarities with counting sort

but using a integer or a datastructure or just the hard disk improves processing time and you create something really interesting

would you read about something like this?
>>
>>8406242
Oh best case its when you dont have an array?

So its like O(0)
>>
>>8406260
Would I get away with something like that?
If that's the case then I think it warrants a submission to a journal
>>
>>8406257
Its a prof that you cant go lower than nlogn without counting.
You can count but you need more space on hdd-ram
Do a N! array list beforehand and check all the comparisions using a quickshort, easy
It will be really quick to Short something... but you will need like 2gbs of Array cases on disk

Memory > Eficiency
>>
>>8406250
Your computations aren't linear, you have log and exp. And log_2 is expensive, just build your own datastructure for big integers and determine the complexity of the log function. You pushed the complexity of traversing the array up to the maximal number into computing the next node
>>
>>8406146
Did you put any thought into the space complexity of your "algorithm"?
>>
>>8405952
Isn't this literally impossible?

Also thanks for reminding me I should listen to/jerk off to this slut again.
>>
>>8406270
I see, I'll have to look into the cost of log_2

What I meant about linear computation is that you have an integer representation of an array which can potentially be used for further analysis
>>
>>8406287
You can save time having more information about the Items you want to short

That Doesnt mean that you need 2487289heptabytes to do a simple short
>>
>>8406287
Yes I got what you meant. But representation trickery often just adds hidden complexity when you assume the transformation is an oracle.
>>
>>8406293
I can always use multiple decimal128 structures
https://en.wikipedia.org/wiki/Decimal128_floating-point_format
>>
>>8406301
Thats what she said
>>
>>8406301
>this is a CS undergrad
Learn how computer memory works until you understand why your idea does not work the way you think it would and why it is not feasible.
>>
>>8406355
How much understanding would I need? Up to And/Or/XOr gates? Or down to the physics of transistors?
>>
>>8406369
>what is a bit
>how it data stored in memory
>how is raising 2 to the power of anything performed in memory
>how is taking the log base 2 of anything performed in memory
>do I really need these expressions to describe what I'm doing
>how much space would my program have to take up
>>
>>8406382
Thank you
>>
>>8406382
Also:
>what happens if elements in my unsorted list are not unique
>>
>>8406146
>SUM ( i -> n) { 2^(a_i) } = `key`

Your algorithm works in exponential time. The input are [math]a_{i}[/math] of length [math]log(a_{i})[/math]. You calculate [math]2^{a_{i}}[/math], which takes exponential time and space in terms of the input length (reminder: the input length is [math]\sum log(a_{i})[/math], not [math]n[/math]).

Exponentiation takes exponential time and space, unless you do modular exponentiation.

QED brainlet
>>
>>8406497
>You calculate [math]2^{a_i}[/math], which takes exponential time
brainlet detected
>>
>>8406578
Show me an algorithm for calculating 2^n which doesn't require exponential time, idiot.

I'd like to remind you, that "exponential time" refers to an exponential number of steps in the length of the input.

Input for exponentiation: a number of log(n) bits length
Output for exponentiation: 2^n, a number of n bits length.

n = 2^log(n)

See brainlet? No matter how you try exponentiation, your output has to have exponentially more bits than your input, so it must take AT LEAST exponential time to do it, because putting the result on the output already takes exponential time.
>>
>>8406497
you just shift a 1 left [math]a_{i}[/math] times to calculate [math]2^{a_{i}}[/math] though. That's linear with the maximum value of an element.
>>
>>8406146
What happens if you have two elements with the same value? I realize that a set disregards duplicates, but not being able to sort an array like [0, 5, 5, 2] is a pretty substantial caveat. You'll also run into trouble if you need to sort a number larger than, say, 64 or so on many architectures.

Essentially, your algorithm sets a bit in `key` which corresponds to each element you want to sort (by the exponential-log to-from mapping you've got there), and then builds a new array by populating it with each number which would set a bit in `key`. Kind of a neat way to approach a sort, but not generally useful without substantial assumptions about your data set.

It has been well-known for a long time that sorting can be more efficient than Ω(nlogn) as long as the data to sort holds certain properties. Any time you notice exploitable patterns in the inputs to sort, by all means, exploit them! However, comparison sorting is ubiquitous for a reason: it's highly adaptable.
>>
>>8406146

would be interesting if you wrote it in prolog, because everything is integers in prolog.

I think your algorithm is interesting, but not useful. I'm not 100% certain but I think I've seen this before.
>>
>>8406645
see this post >>8406886
even exponentiation with an arbitrary base b can be done in [math]\mathcal{O}(\log b)[/math] time
now fuck off
>>
this is op
i just want people to know about this. it may not be a perfectly new algo but its a new way to do it

I wanted to share because im going to kill myself soon (after december) so I want to contribute as much as possible
>>
>>8407090
yes, surely killing yourself will help achieve the aim of contributing as much as possible
>>
>>8407090
dont do it
>>
>>8406209
>>8406146
>>8405952

anon, basically what this is is a sparse sort

here's a JS implementation

https://jsfiddle.net/vuhfbq6u/

[329, 235, 234, 234, 543, 123, 435].reduce((k, a) => { (!k[a])&&(k[a]=[]); k[a].push(a); return k;
}, []).filter((i) => {return(i)}).reduce((a, b) => {
return a.concat(b);})

It's not really different from the meme sleep search in principle.

You are making a huge array (encoded as your "key") which is basically just a bunch of zeroes with 1s at the positions where the elements are.
>>
>>8407150
too lazy to read thread, just skimmed it

but lol if someone thinks using infinite space to sort something is "fast".

If we could use infinite space for free than P=NP would be true trivially. Looking into infinite space algorithms is pretty common though and not something to be embarrassed about.
>>
Another anon in this thread said what I'm about to say, but I'm going to put it into terms that you might better understand given your level of knowledge.

Let's say that I want to sort an array of ten unsigned 32-bit integers. Let's say that the maximum value in this array is the maximum integer value, 2^32 - 1 = 4294967295.

When I compute the 'key' for this element, I need to evaluate 2^4294967295. How many bits do I need to store this value? The number of bits I need is log base 2 of this value, 4294967295. That's a lot of bits to have to store, and I need to take at least one time step to write each of those bits. Given your algorithm and arrays consisting of 32-bit integers, I need to take potentially 4294967295 steps per element in the array; that's on the order of seconds per element.

Now, let's quantify a lower bound on the time complexity (what you call efficiency) of your algorithm given just this detail. It seems you're making a simple rookie mistake in assuming that the parameter n refers to simply the number of elements in your array; this is only true when you fix the length of the size of elements in the array (in the previous example, 32 bits). For a general-case sorting algorithm, n refers to the number of total bits required to represent the array when the maximum size of any element has no limit. In the general case, the actual n is greater than or equal to the maximum bit length of any element in your input, d.

Let's say now that I'm executing your algorithm on an input having a maximum element length of size d. Let's say that this is a large number (more 1s than 0s). I then need to evaluate 2^(2^d) (since the max value in d bits is 2^d, the max value in the array will potentially be 2^d). I need 2^d bits (and time steps) to write this value. Given that n >= d, we then know that your algorithm will take at least on the order of 2^n steps since it has to potentially handle the largest values in the range.
>>
>>8407219

Now, granted that what I've explained is essentially to say that your algorithm is an exponential-time sorting algorithm, it may have uses: for example, it may be more efficient for sorting arrays of values clustered around a single point with low variance.

I'm not going to dog on you for your simple mistakes. I think it's great that you're trying to apply creative thought and get a taste for what the more theoretical aspects of computer science have to offer. Keep trying!
>>
>>8407160

It's a well-known result in complexity theory that no algorithm having space complexity greater than polynomial in n can run in time polynomial in n.
>>
>>8405952
>>8406146

try inserting the same value twice, see what happens

spoiler: you suddenly have a new value you can't possibly explain where it came from.
>>
>>8407219

Sorry; I glossed over some details that might not be readily apparent. The worst-case scenario for your algorithm given an input of length n is the case when it just has one, two, two, three, . . ., constant-many elements--say 10--each of large size d = n/10. The exact run-time in this worst case will look something like 2^(n/10), which is the same as 2^n asymptotically.
>>
>>8407226
What about sleep sort?
>>
>>8407259

it's virtually identical to sleep sort

on a single thread machine, sleep sort is O(n!) or O(2^n) i believe
>>
>>8407259

Any algorithm requires at least as many steps as it does bits of space: this is because a write is counted as a step in every model of computation. If sleep sort requires more than a polynomial number of bits of space, then it requires more than a polynomial number of steps to complete.
>>
>>8407263

To OP: if you want a challenge, here's my suggestion:

Sleep sort is somewhat interesting in that it's a time-based sort. The drawback is that it needs to wait potentially omega(2^n) units of time (see discussion of worst-case unbounded inputs given an input of length n) for the largest element in the array.

Can you come up with a method which can, in time linear in the input, characterize a function f over those inputs for which the sorted order f(x_1) . . . f(x_n) respects the sorted order of x_1 . . . x_n while reducing the maximum f(x_i) to some value that is linear in the number of elements in the array? Rather than sleeping x_i seconds for each x_i, you could sleep f(x_i) seconds and output the array in order.

Or: can you characterize a similar randomized function which partially respects the order and has a maximum f(x_i) sub-linear in the number of elements in the array and which may produce a true order with high probability in strictly less than log n rounds?

If you work on these and want advice, just come back to /sci/. I lurk, and I'll give my input whenever I see you come up with anything.
>>
I think its limitation is that it's exponential in its memory use
>>
>>8406886
>you just shift

Yep. You "just shift"... Care to elaborate what happens beyond n = 32 (or 64 if you have 64 bit int)?

I'm offering you 10 BTC, if you write a program for me that takes a number in binary or decimal as input, and outputs 2^n in binary or decimal and has time complexity less than O(2^n). (n being the length of the input)

>>8407069
I'm offering you the same challenge, brainlet ;)
>>
>>8407263
didn't someone in that thread get it down to nlogn?
>>
>>8407632
>I'm offering you 10 BTC, if you write a program for me that takes a number in binary or decimal as input, and outputs 2^n in binary or decimal and has time complexity less than O(2^n). (n being the length of the input)
define "output"
void nigger(int n) {
int i;
printf("%c", '1');
for (i = 1; i < n; i++) {printf("%c",'0');
}
inefficient and borderline retarded, but does the job and runs in O(n).
anything else is just a question of memory allocation/management
>>
>>8407632
python's interperter will resize your container for you, so you can shift left until your system runs out of memory if you want.
Thread posts: 76
Thread images: 7


[Boards: 3 / a / aco / adv / an / asp / b / bant / biz / c / can / cgl / ck / cm / co / cock / d / diy / e / fa / fap / fit / fitlit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mlpol / mo / mtv / mu / n / news / o / out / outsoc / p / po / pol / qa / qst / r / r9k / s / s4s / sci / soc / sp / spa / t / tg / toy / trash / trv / tv / u / v / vg / vint / vip / vp / vr / w / wg / wsg / wsr / x / y] [Search | Top | Home]

I'm aware that Imgur.com will stop allowing adult images since 15th of May. I'm taking actions to backup as much data as possible.
Read more on this topic here - https://archived.moe/talk/thread/1694/


If you need a post removed click on it's [Report] button and follow the instruction.
DMCA Content Takedown via dmca.com
All images are hosted on imgur.com.
If you like this website please support us by donating with Bitcoins at 16mKtbZiwW52BLkibtCr8jUg2KVUMTxVQ5
All trademarks and copyrights on this page are owned by their respective parties.
Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
This is a 4chan archive - all of the content originated from that site.
This means that RandomArchive shows their content, archived.
If you need information for a Poster - contact them.