项目作者: slerpyyy

项目描述 :
Linear Time Byte Sorter
高级语言: C
项目地址: git://github.com/slerpyyy/bytesort.git
创建时间: 2019-07-09T18:04:57Z
项目社区:https://github.com/slerpyyy/bytesort

开源协议:

下载


Bytesort

Linear Time Byte Sorter

Bytesort is a scalable and heavily performance optimized sorting program. It uses histogram sort, a variation of bucket sort to sort all bytes from STDIN with a worst-case runtime of O(N).

This program exists purely as a proof of concept. It is meant more as a demonstration of a linear time sorting algorithm than a genuine tool for applications, where sorting large quantities of 8-bit numbers would be necessary, as I don’t expect there to be that many.

The Algorithm

This program is an implementation of the sorting algorithm, formally known histogram sort. The basic idea is that, if you generate the histogram from an array of integers and throw away the original array, the only piece of information you’re losing about the numbers in the original array is their order.

Step-by-Step Example

  1. Start with a string of letters:

    1. cdbacdda
  2. Count how often each letter appears:

    1. a:2 b:1 c:2 d:3
  3. Make a new string using the information collected above:

    1. aabccddd

Runtime Complexity

Both the generation of the histogram and the generation of the final string scale linearly to the length of the original string. As both processes run in sequence, the runtime of the entire sorting algorithm is proportional to n + n = 2n, which means histogram sort takes O(n) time.

Usage

  1. Usage: cat [FILE] | ./bytesort [OPTION]...
  2. Sorts all the bytes from STDIN and writes them to STDOUT.
  3. -h, --help Shows this message.
  4. -c, --compact Prints each character only once followed by the
  5. number of times it appears. (aaaaa --> a:5)
  6. -x, --hex Replaces each byte with it hex code in compact mode.
  7. -p, --printable Ignored all bytes outside the range of printable
  8. ASCII characters [0x21; 0x7E].
  9. -n, --newline Adds a newline to the end of the output.
  10. WARNING: This program may exit with an error when attempting
  11. to sort more than 18447 Petabytes of data.

The Algorithm

This program is an implementation of the sorting algorithm, formally known as histogram sort. The basic idea is that, if you generate the histogram from an array of integers and throw away the original array, the only piece of information you’re losing about the numbers in the original array is their order.

Step-by-Step Example

  1. Start with a string of letters:

    1. cdbacdda
  2. Count how often each letter appears:

    1. a:2 b:1 c:2 d:3
  3. Make a new string using the information collected above:

    1. aabccddd

Runtime Complexity

Both the generation of the histogram and the generation of the final string scale linearly with the length of the original string. As both processes run in sequence, the runtime of the entire sorting algorithm is proportional to n + n = 2n, which means histogram sort takes O(n) time.

Building

On Linux, you can build this program by running the following command:

  1. gcc bytesort.c -o bytesort -Wall -Ofast -funroll-loops

I’m not planning on adding Windows support any time soon.