#P7086. [NWRRC2013] Heavy Chain Clusterization

[NWRRC2013] Heavy Chain Clusterization

题目描述

A group of biologists is trying to find a cure for a viral disease. They have tried many antibodies of various origins that could potentially fight the viral antigens, and have selected nn antibodies that seem to work best during experiments.

Each antibody is identified by its heavy chain -- a sequence of amino acids.

The set of antibodies form a similarity cluster, if at least one of the following holds:

k-prefixes (first kk amino acids) of all their heavy chains are equal;

k-suffixes (last kk amino acids) of all their heavy chains are equal.

In order to simplify the future research, biologists want to group antibodies to similarity clusters.

You need to split the given antibodies to a minimum number of similarity clusters.

输入格式

The first line contains two integers nn and kk -- the number of heavy chains and the length of sequence of amino acids to coincide (1n5000,1k550)(1 \le n \le 5 000 , 1 \le k \le 550) .

The following nn lines contain sequences of amino acids that form heavy chains of antibodies. Each amino acid described with an uppercase English letter. Each heavy chain contains at least kk and no more than 550550 amino acids.

输出格式

The first line of output must contain a single integer -- the minimum number of similarity clusters. The following lines must contain descriptions of clusters, one per line.

Each description starts with mim_i -- the number of antibodies in the cluster and is followed by mim_i integers -- numbers of these antibodies. Antibodies are numbered in the order of appearance in the input starting from one.

Each antibody must be present in exactly one cluster.

题目大意

现有nn个抗生素,每一个抗生素都可以通过它的一个氨基酸序列被辨认。

如果有一组抗生素满足以下任意一个条件,这些抗生素才可以形成一个“相似群体”:

  1. 组内所有抗生素的氨基酸序列的前kk个氨基酸元素组成的序列全部相同

  2. 组内所有抗生素的氨基酸序列的后kk个氨基酸元素组成的序列全部相同

特别的,一种抗生素可以成为一个相似群体。

为了简化后续研究,生物学家们想要给抗生素分类,使得这nn个抗生素被分成若干个相似群体。你需要找出一种方案,使得分类后的相似群体数量最少。


第一行包含两个正整数,nnkk,含义如题意描述。(1n50001k5501≤n≤5000,1≤k≤550

接下来有nn行,每行包含一个氨基酸序列。氨基酸序列被描述为由大写英语字母组成的字符串,每个序列包含不少于kk个,不多于550550个英文字符。每个英文字符代表一种不同的氨基酸。


第一行包含一个正整数ansans,表示被分成的相似群体最小可能的数目。

接下来有ansans行,每行第一个数为mim_i,表示第ii个相似群体有mim_i个元素。接下来有mim_i个数,第jj个数表示这个相似群体包含输入中的第jj个抗生素。(编号从11开始。)

每一个抗生素必须恰好出现一次。

4 1
AA
AB
BB
BA

2
2 1 2
2 3 4

3 2
ABA
BAB
XY

3
1 1
1 2
1 3

提示

Time limit: 2 s, Memory limit: 256 MB.