Class MSBRadixSorter

  • Direct Known Subclasses:
    StableMSBRadixSorter, StringMSBRadixSorter

    public abstract class MSBRadixSorter
    extends Sorter
    Radix sorter for variable-length strings. This class sorts based on the most significant byte first and falls back to IntroSorter when the size of the buckets to sort becomes small.

    This algorithm is NOT stable. Worst-case memory usage is about 2.3 KB.

    • Field Detail

      • histograms

        private final int[][] histograms
      • endOffsets

        private final int[] endOffsets
      • commonPrefix

        private final int[] commonPrefix
      • maxLength

        protected final int maxLength
    • Constructor Detail

      • MSBRadixSorter

        protected MSBRadixSorter​(int maxLength)
        Sole constructor.
        Parameters:
        maxLength - the maximum length of keys, pass Integer.MAX_VALUE if unknown.
    • Method Detail

      • byteAt

        protected abstract int byteAt​(int i,
                                      int k)
        Return the k-th byte of the entry at index i, or -1 if its length is less than or equal to k. This may only be called with a value of i between 0 included and maxLength excluded.
      • getFallbackSorter

        protected Sorter getFallbackSorter​(int k)
        Get a fall-back sorter which may assume that the first k bytes of all compared strings are equal.
      • compare

        protected final int compare​(int i,
                                    int j)
        Description copied from class: Sorter
        Compare entries found in slots i and j. The contract for the returned value is the same as Comparator.compare(Object, Object).
        Specified by:
        compare in class Sorter
      • sort

        public void sort​(int from,
                         int to)
        Description copied from class: Sorter
        Sort the slice which starts at from (inclusive) and ends at to (exclusive).
        Specified by:
        sort in class Sorter
      • sort

        protected void sort​(int from,
                            int to,
                            int k,
                            int l)
      • introSort

        private void introSort​(int from,
                               int to,
                               int k)
      • radixSort

        private void radixSort​(int from,
                               int to,
                               int k,
                               int l)
        Parameters:
        k - the character number to compare
        l - the level of recursion
      • assertHistogram

        private boolean assertHistogram​(int commonPrefixLength,
                                        int[] histogram)
      • getBucket

        protected int getBucket​(int i,
                                int k)
        Return a number for the k-th character between 0 and HISTOGRAM_SIZE.
      • computeCommonPrefixLengthAndBuildHistogram

        private int computeCommonPrefixLengthAndBuildHistogram​(int from,
                                                               int to,
                                                               int k,
                                                               int[] histogram)
        Build a histogram of the number of values per bucket and return a common prefix length for all visited values.
        See Also:
        buildHistogram(int, int, int, int[])
      • buildHistogram

        private void buildHistogram​(int from,
                                    int to,
                                    int k,
                                    int[] histogram)
        Build an histogram of the k-th characters of values occurring between offsets from and to, using getBucket(int, int).
      • sumHistogram

        private static void sumHistogram​(int[] histogram,
                                         int[] endOffsets)
        Accumulate values of the histogram so that it does not store counts but start offsets. endOffsets will store the end offsets.
      • reorder

        protected void reorder​(int from,
                               int to,
                               int[] startOffsets,
                               int[] endOffsets,
                               int k)
        Reorder based on start/end offsets for each bucket. When this method returns, startOffsets and endOffsets are equal.
        Parameters:
        startOffsets - start offsets per bucket
        endOffsets - end offsets per bucket