#P9694. [GDCPC2023] New but Nostalgic Problem

[GDCPC2023] New but Nostalgic Problem

题目描述

Given nn strings w1,w2,,wnw_1, w_2, \cdots, w_n, please select kk strings among them, so that the lexicographic order of string vv is minimized, and output the optimal string vv. String vv satisfies the following constraint: vv is the longest common prefix of two selected strings with different indices. Also, vv is the lexicographically largest string among all strings satisfying the constraint.

More formally, let S\mathbb{S} be a set of size kk, where all the elements in the set are integers between 11 and nn (both inclusive) and there are no duplicated elements. Let lcp(wi,wj)\text{lcp}(w_i, w_j) be the longest common prefix of string wiw_i and wjw_j, please find a set S\mathbb{S} to minimize the lexicographic order of the following string vv and output the optimal string vv.

$$v = \max\limits_{i \in \mathbb{S}, j \in \mathbb{S}, i \ne j} \text{lcp}(w_i, w_j) $$

In the above expression, max\max is calculated by comparing the lexicographic order of strings.

Recall that:

  • String pp is a prefix of string ss, if we can append some number of characters (including zero characters) at the end of pp so that it changes to ss. Specifically, empty string is a prefix of any string.
  • The longest common prefix of string ss and string tt is the longest string pp such that pp is a prefix of both ss and tt. For example, the longest common prefix of abcde and abcef is abc, while the longest common prefix of abcde and bcdef is an empty string.
  • String ss is lexicographically smaller than string tt (sts \ne t), if
    • ss is a prefix of tt, or
    • sp+1<tp+1s_{|p| + 1} < t_{|p| + 1}, where pp is the longest common prefix of ss and tt, p|p| is the length of pp, sis_i is the ii-th character of string ss, and tit_i is the ii-th character of string tt.
  • Specifically, empty string is the string with the smallest lexicographic order.

输入格式

There are multiple test cases. The first line of the input contains an integer TT indicating the number of test cases. For each test case:

The first line contains two integers nn and kk (2n1062\leq n\leq 10^6, 2kn2\leq k\leq n) indicating the total number of strings and the number of strings to be selected.

For the following nn lines, the ii-th line contains a string wiw_i (1wi1061\leq |w_i|\leq 10^6) consisting of lower-cased English letters.

It's guaranteed that the total length of all strings of all test cases will not exceed 10610^6.

输出格式

For each test case output one line containing one string indicating the answer. Specifically, if the answer is an empty string, print EMPTY\texttt{EMPTY}.

题目大意

【题目描述】

给定 nn 个字符串 w1,w2,,wnw_1, w_2, \cdots, w_n,请选出恰好 kk 个字符串,最小化字符串 vv 的字典序,并输出这个最优的字符串 vv。其中 vv 满足以下条件:vv 是被选出的字符串中,某两个编号不同的字符串的最长公共前缀。而且,vv 是所有满足条件的字符串中,字典序最大的字符串。

更正式地,令 S\mathbb{S} 表示一个大小为 kk 的集合,集合中的元素均为从 11nn 的整数(含两端),且没有重复的元素。令 lcp(wi,wj)\text{lcp}(w_i, w_j) 表示字符串 wiw_iwjw_j 的最长公共前缀,您需要找到一个集合 S\mathbb{S} 以最小化下述字符串 vv 的字典序,并输出这个最优的字符串 vv

$$v = \max\limits_{i \in \mathbb{S}, j \in \mathbb{S}, i \ne j} \text{lcp}(w_i, w_j) $$

上式中的 max\max 通过字典序比较两个字符串。

请回忆:

  • 称字符串 pp 是字符串 ss 的前缀,若可以在 pp 的末尾添加若干个字符(包括零个字符)将它变成 ss。特别地,空字符串是任意字符串的前缀。
  • 字符串 sstt 的最长公共前缀是一个最长的字符串 pp,满足 pp 既是 ss 的前缀,又是 tt 的前缀。例如,abcdeabcef 的最长公共前缀为 abc,而 abcdebcdef 的最长公共前缀为空字符串。
  • 称字符串 ss 的字典序小于字符串 ttsts \ne t),若
    • sstt 的前缀,或
    • sp+1<tp+1s_{|p| + 1} < t_{|p| + 1},其中 ppsstt 的最长公共前缀,p|p|pp 的长度,sis_i 表示字符串 ss 的第 ii 个字符,tit_i 表示字符串 tt 的第 ii 个字符。
  • 特别地,空字符串是字典序最小的字符串。

【输入格式】

有多组测试数据。第一行输入一个整数 TT 表示测试数据组数。对于每组测试数据:

第一行输入两个整数 nnkk2n1062\leq n\leq 10^62kn2\leq k\leq n),表示字符串的总数和需要选择的字符串的数量。

对于接下来 nn 行,第 ii 行输入一个由小写字母构成的字符串 wiw_i1wi1061\leq |w_i|\leq 10^6)。

保证所有数据中字符串长度之和不超过 10610^6

【输出格式】

每组数据输出一行一个字符串表示答案。特别地,若答案为空字符串,输出 EMPTY\texttt{EMPTY}

2
5 3
gdcpc
gdcpcpcp
suasua
suas
sususua
3 3
a
b
c
gdcpc
EMPTY