kickstart-2019-H-T1-HIndex

2019年kickstart最后一轮，下午1点到4点。
真心感慨，自己要学的东西还有很多啊，
本文是第一题H-index，计算科研人员影响因子的一道题目。

Problem

It is important for researchers to write many high quality academic papers. Jorge has recently discovered a way to measure how impactful a researcher’s papers are: the H-index.

The H-index score of a researcher is the largest integer h such that the researcher has h papers with at least h citations each.

Jorge has written N papers in his lifetime. The i-th paper has Ai citations. The number of citations that each paper has will never change after it is written. Please help Jorge determine his H-index score after each paper he wrote.

Input

The first line of the input gives the number of test cases, T. T test cases follow. Each test case begins with a line containing N, the number of papers Jorge wrote.

The second line contains N integers. The i-th integer is Ai, the number of citations the i-th paper has.

Output

For each test case, output one line containing Case #x: y, where x is the test case number (starting from 1) and y is a space-separated list of integers. The i-th integer is the H-index score after Jorge wrote his i-th paper.

Limits

Time limit: 50 seconds per test set.
Memory limit: 1GB.
1 ≤ T ≤ 100.
1 ≤ Ai ≤ 105.

Test set

Test set 1 (Visible)

1 ≤ N ≤ 1000.

Test set 2 (Hidden)

1 ≤ N ≤ 105.

Sample

// Input
2
3
5 1 2
6
1 3 3 2 2 15

//Output
Case #1: 1 1 2
Case #2: 1 1 2 2 2 3

In Sample Case #1, Jorge wrote N = 3 papers.
After the 1st paper, Jorge’s H-index score is 1, since he has 1 paper with at least 1 citation.
After the 2nd paper, Jorge’s H-index score is still 1.
After the 3rd paper, Jorge’s H-index score is 2, since he has 2 papers with at least 2 citations (the 1st and 3rd papers).

In Sample Case #2, Jorge wrote N = 6 papers.
After the 1st paper, Jorge’s H-index score is 1, since he has 1 paper with at least 1 citation.
After the 2nd paper, Jorge’s H-index score is still 1.
After the 3rd paper, Jorge’s H-index score is 2, since he has 2 papers with at least 2 citations (the 2nd and 3rd papers).
After the 4th paper, Jorge’s H-index score is still 2.
After the 5th paper, Jorge’s H-index score is still 2.
After the 6th paper, Jorge’s H-index score is 3, since he has 3 papers with at least 3 citations (the 2nd, 3rd and 6th papers).

思路

原始版思路

因为每次写完一篇文章都要给出一个结果，所以肯定每次有新的文章都要重新计算一次，而H-index的计算方式是说至少有h篇文章，然后每篇文章都至少有h的引用数，所以我就联想到上一轮kickstart的第一题，借鉴到相同思路，我们可以通过建立一个数组，保存对应的引用数目，

比如有N篇文章，那么引用最高为N，方便起见，我们建立一个N+1的数组（全部初始化为0）。0索引位置不再使用。然后每个位置代表有这个引用的文章有多少篇，然后在我寻找h-index的时候，只要当前位置寻找arr[i] >=i的第一个数字即可。

以[5,1,2]为例进行讲解。

首先建立一个4(也就是3+1)个元素的数组，[0,0,0,0]
第一个数字为5，那么1到3号索引位置元素都+1，变成[0,1,1,1];
这个时候从1号索引位置往前找第一个符合arr[i]>=i的元素，也就是1；
第二个数字为1，那么1号索引位置元素+1，变成[0,2,1,1];
这个时候从2号索引位置往前找第一个符合arr[i]>=i的元素，还是1；
第三个数字是2，那么1号到2号索引位置元素+1，变成[0,3,2,1];
这个时候从3号索引位置往前找第一个符合arr[i]>=i的元素，此时变成了2；
所以最终的结果也就是[1,1,2]

算法复杂度是O(n^2)，可以通过小case。

升级版思路

这个思路就会比较好，构造了一个multiset来存储元素，这里的一个好处就是他会对插入的元素进行自动排序，默认是从小到大的顺序，底层是用红黑树实现的，插入删除查找的复杂度都是lg(n)级别的。然后我们每次将读入的数字插入到multiset中，然后判断头元素（实际就是所有元素中最小的元素）是否小于容器内元素个数。如果小于，那么删除头元素。最后我们想要的h-index其实就是每一轮之后的容器内元素个数。

还是以[5,1,2]为例进行讲解。

开始插入5，此时s={5},然后5大于当前元素个数1，所以s保持不变s={5},所以第一轮结果为1；
然后插入1，此时s={1,5}, 此时1是小于当前元素个数2，所以要删除第一个元素，s变成{5},所以第二轮结果为1；
然后插入2，此时s={2,5},然后2要等于当前元素个数，所以s保持不变，s= {2,5},所以第三轮返回结果为2.
所以最终结果也就是[1,1,2]

算法的复杂度为O(nlgn)

代码

原始版

#include <cstdio>
#include <cstring>
#include <algorithm>
#include <vector>
#include <string>
#include <set>

using namespace std;
typedef long long LL;


int main(){
    int T;
    scanf("%d", &T);
    string res;
    for (int t = 1; t <= T; ++t) {
        int M;
        scanf("%d",&M);
        vector<int> arr(M+1,0);
  
        int x;
        arr[0] = 0;
        res ="";
        for(int i = 1;i<=M;i++){
             scanf("%d",&x);
             for(int j = 1; j <= x && j<=M ; j++){
                  arr[j]++;
                }
             int k = i;
             while(k>0){
                 if(arr[k] >= k){
                     res  += " "+to_string(k);
                     break;
                 }
                 k--;
             }
         }

        printf("Case #%d: %s \n", t, res.c_str());
            
    }
    return 0;
}

升级版

#include <cstdio>
#include <string>
#include <set>

using namespace std;
typedef long long LL;

int main(){
    int T;
    scanf("%d", &T);
    string res;
    for (int t = 1; t <= T; ++t) {
        int M;
        scanf("%d",&M);
        int x;
        string res = "";
        multiset<int> aset;
        for(int i = 0;i<M;i++){
            scanf("%d", &x);
            aset.insert(x);
            int b = *aset.begin();
            if(b < aset.size()){
                aset.erase(aset.begin());
            }
            res += " "+to_string(aset.size());
        }
        printf("Case #%d: %s \n", t, res.c_str());
            
    }
    return 0;
}

反思

想了一下这个题目和之前G轮的第一题BookReading有什么不同，结论是对于bookreading来说，首先是把所有的结果都统计一遍之后，然后所有的测试用例都只需要访问之前的结果就可以了（用一个数组来表示每页书的情况，然后读取对应因子的倍数得到相应的结果，其实就是上面链接中的动态规划第一版）从而可以节省大量的时间。但是本题目并不是这样子，本题目对于每次的有一个新的数值输入，都会进行新一轮的结果运算，所以如果我们再依次的去保存每一个位置的结果，尤其是通过对数组中元素依次加1的方式实现，复杂度必然会很高，现在想想的话，其实就比每次都完全的遍历一遍（每一次都从index开始找，看看paper数组中元素个数有没有大于index，找到第一个复合条件的即可,总的时间复杂度为O(n^3)）好那么一点。然后考虑到我们其实想要的结果是数量，所以就可以非常巧妙的利用一些数据结构，上面的测试用例用的是multiset，其实用最小堆，用unordered_map都可以实现，重点是保证内部元素有序，从而可以利用容器中的数量来变相的表示我们想要的结果。
以后要注意很多有用的数据结构的使用方法，比如本次使用的就是multiset这个数据结构，有机会再去看看其内部实现，首先要知道才能在真正要使用的时候想到它。
题目做的还是太少了，见识太少了，以后还是要坚持刷题啊，见多识广才能见怪不怪。
看人家的好的解题思路真的是赏心悦目，倍感舒适啊。