[LeetCode] 274. H-Index

Given an array of citations (each citation is a non-negative integer) of a researcher, write a function to compute the researcher's h-index.
According to the definition of h-index on Wikipedia: "A scientist has index h if h of his/her N papers have at least h citations each, and the other N − h papers have no more than h citations each."
For example, given citations = [3, 0, 6, 1, 5], which means the researcher has 5 papers in total and each of them had received 3, 0, 6, 1, 5 citations respectively. Since the researcher has 3 papers with at least 3 citations each and the remaining two with no more than 3 citations each, his h-index is 3.
Note: If there are several possible values for h, the maximum one is taken as the h-index.

Thought process:
The definition of h-index in the problem is confusing. Actually, according to Wikipedia, "the definition of the index is that a scholar with an index of h has published h papers each of which has been cited in other papers at least h times".
A naive solution is to sort the citations. Iterate through them in reverse order. If citations[i] < i + 1, break. The h-index is i.

Solution 1:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
public class Solution {
    public int hIndex(int[] citations) {
        Arrays.sort(citations);
        for (int i = citations.length - 1; i >= 0; i--) {
            int h = citations.length - i;
            if (citations[i] < h) {
                return h - 1;
            }
        }
        return citations.length;
    }
}

Time complexity: O(nlogn).

Is there a better way? Comparison-based sorting algorithms have a lower bound of O(nlogn). To do better, I need to use an O(n) sorting algorithm. In this case, counting sort. 
Counting sort requires the elements are within a certain range. In this problem, there is no upper limit for citations. How do I use counting sort? Note that although there is no upper bound for citations, there is an upper bound for h-index, length of array. Therefore, whenever I see an element that's larger than length, replace it with length.
Traditionally, counting sort calculates the counts of every element, creates a less array to make the sorting stable, and derives the sorted array based on the less array. In this problem, I don't need the sorting to be stable because I don't care the relative positions of equal citations. In fact, I even don't need the sorted citations, because I only need the sum of the citations in descending order, and I can get that as I iterate through the counts array.
Iterate through the counts backwards. Keep track of the sum of counts. The first index where sum >= index is the h-index.

Solution 2:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
public class Solution {
    public int hIndex(int[] citations) {
        int length = citations.length;
        int[] count = new int[length + 1];
        
        for (int citation : citations) {
            count[Math.min(citation, length)]++;
        }
        
        int sum = 0;
        for (int h = length; h >= 0; h--) {
            sum += count[h];
            if (sum >= h) {
                return h;
            }
        }
        return 0;
    }
}

Time complexity: O(n).

Comments

  1. "A naive solution is to sort the citations. Iterate through them in reverse order. If citations[i] < i + 1, break. The h-index is i."
    I know this is right. But can you prove it why? Or just explain it?

    ReplyDelete

Post a Comment

Popular posts from this blog

[LeetCode] 714. Best Time to Buy and Sell Stock with Transaction Fee

[LeetCode] 269. Alien Dictionary

[LeetCode] 631. Design Excel Sum Formula