#P3496. Word Hyphenation
Word Hyphenation
Description
Hyphenation bears importance in typesetting for its contribution to more efficient use of paper as well as more regular appearance of right-side margins without requiring significant spacing adjustments. Compare the following:
Without hyphenation | With hyphenation | |
---|---|---|
Flush left | Long words like supercalifragilisticexpialidocious and antidisestablishmentarianism make this paragraph look agonizing. | Even with long words like supercal-ifragilisticexpialidocious and antidis-establishmentarianism this paragraph looks comfortable. |
Justified | Long words like supercalifragilisticexpialidocious and antidisestablishmentarianism make this paragraph look agonizing. | Even with long words like supercal-ifragilisticexpialidocious and antidis-establishmentarianism this paragraph looks comfortable. |
Typesetting systems and word processors use hyphenation algorithms to hyphenate words automatically. Generally, hyphenation algorithms is based on either rules or dictionaries. Rule-based algorithms are inevitably subject to errors. For example, the vowel-consonant-consonant-vowel breaking rule, used in the original TeX hyphenation algorithm, does not work with words like selfadjoint (compare sel-fadjoint and self-adjoint). Dictionary-based algorithms require a vast amount of storage for dictionaries and cannot handle words missing from the dictionaries.
In the 1983 paper Word Hy-phen-a-tion by Com-put-er, Franklin Mark Liang proposed a neat solution to the problem. Liang’s algorithm is primarily based on rules, but the rules are generated from dictionaries. Rules in Liang’s algorithm are patterns formed by inserting single numbers into short sequences of letters. An odd number defines a hyphenating point; an even number defines an inhibiting point. For example, 2b5s2 means that for absorb, hyphenation can occur just after ab- but not before the first b or after s. Dots may appear at either ends of patterns, matching the corresponding ends of words. For example, .ca4t is effective in cation but not in dedication or copycat. If multiple patterns define both hyphenating and inhibiting points in the same position in a word, only the hyphenating/inhibiting point defined by the largest number is effective. In light of this rule, .ca4t prevents 1tio from incorrectly hyphenating cation into cat-ion. A final rule dictates that hyphenation should not occur just after the first letter or before the last letter, and words shorter than 5 letters should not be hyphenated.
With an carefully selected set of patterns, Liang’s algorithm requires a very small exception list to ensure that it correctly hyphenates all common words appearing in the dictionary. The lastest distribution of the TeX system uses an exception list consisting of only 14 words.
Given a set of hyphenation patterns and an exception list, use Liang’s algorithm to decide the hyphenating points of some words.
Input
The input is divided into two parts by a separator line containing a single #. The first part contains a list of hyphenation patterns and the exception list. The largest possible number that appears in the patterns is six. Words in the exception list are preceded by single @’s, with -’s indicating all possible hyphenating points. All letters in the first part are lowercase. The second part contains the words to be hyphenated. Words consist of only lowercase or uppercase letters. Matching between patterns and words is case-insensitive.
Output
For each word given, output the result of hyphenation using -’s to indicate all hyphenating points. Cases of letters should be preserved.
.ca4t hy3ph he2n hena4 hen5at
1na n2at 1tio 2io o2n. @ta-ble
#
hyphenation
cation
table
hy-phen-ation
cation
ta-ble
Source
POJ Founder Monthly Contest – 2008.01.31, frkstyc