Coding Practice - Find the Majority Elements

Reading time ~4 minutes

Majority Elements (MEs)

Given sequence of length , the -majority elements of are defined as the elements in whose occurrences are more than . For example, if the sequence is , and , then the -majority elements are , since it is the only element whose occurrences are more than .

Methods to Find (MEs)

Basic Hashing Solution

The most direct solution to finding the majority elements would be using a hash table as counters for different elements, and then compare these counters with . The following give the pseudo codes.

# INPUT: sequence A, and theta
# OUTPUT: majority elements K
N = len(A)
counters = {}
for x in A:
    if counters.has_key(x):
        counters[x] += 1
    else:
        counters[x] = 1

K = []
threshold = int(theta * N)
for key, value in counters.iteritems():
    if value > threshold:
        K.append(key)

Let do some analysis on the above algorithm. First of all, let’s introduce some notations. Let be the set of alphabets, and be all the possible sequences that can be constructed by using the alphabets from . Let be a sequence. And , , , and is the number of unique elements in .

Obviously, there are at most majority elements in . However, in this basic hashing solution, we need space, and time. That is to say, when (this is the common case that we might encounter in real-life), we would waste a large amount of memory, because amongst all these counters, there are at most would be larger than . So, could we achieve space complexity?

Smarter Solution

Actually, it turns out we can achieve space solution by using similar ideas as Boyer–Moore majority vote algorithm, which is an space-efficient linear-time algorithm to find the majority element in an sequence.

This method was first proposed by Richard M. Karp, Scott Shenker, and Christos H. Papadimitriou in @KarpShenkerPapadimitriou2003 and Erik D. Demaine et al in DemaineLopezOrtizMunro2002. It works as follows. It increases the number of counters as encountering new elements when passing through the sequence, and whenever the number of counters reaches , it starts to decrease all the counters by one, and if any counter reaches , removing it. Therefore, it can guarantee that there are at most counters. It is easy to figure out if an element has occurrences more than , its counter must survive. Therefore, if we define as the set of elements whose counters survived after passing through all the elements in the sequence. The the set of majority elements must be a subset of . Therefore, by passing through the sequence again checking whether the occurrences of elements in exceed , we can obtain all the majority elements.

The following provides that pseudo codes.

# INPUT/OUTPUT the same as before

N = len(A)
counters = {}

M = int(1.0 / theta)
for x in A:
    if counters.has_key(x):
        counters[x] += 1
    else:
        counters[x] = 1

    if len(counters) > M:
        for key in counters:
            counters[key] -= 1
            if counters[key] == 0:
                del counters[key]

# check all survived keys in counters
for key in counters:
    counters[key] = 0

for x in A:
    if counters.has_key(x):
        counters[x] += 1

K = []
threshold = int(theta * N)
for key in counters:
    if counters[key] > threshold:
        K.append(key)

References

Useful Linux Software

Ubuntu Software Installation Continue reading

Useful user-defined LaTeX commands

Published on November 27, 2016