General ⇒ Hyphenation of similar words
Hyphenation of similar words
"pretpostavka", "pretpostavke", "pretpostavku", "pretpostaviti", "pretpostavljati" etc.
I would like to set this scheme: "pret-po-stav" to apply to all variants of this word (common part) regardless of the suffix. Is it possible? Is there some kind of "regex" that can be used in \hyphenation for this situation (something like: "pret-po-stav*" where "*" will replace suffixes)?
NEW: TikZ book now 40% off at Amazon.com for a short time.
And: Currently, Packt sells ebooks for $4.99 each if you buy 5 of their over 1000 ebooks. If you choose only a single one, $9.99. How about combining 3 LaTeX books with Python, gnuplot, mathplotlib, Matlab, ChatGPT or other AI books? Epub and PDF. Bundle (3 books, add more for higher discount): https://packt.link/MDH5p
- black-wolf
- Posts: 7
- Joined: Wed Jul 02, 2008 10:58 pm
Hyphenation of similar words
Why don't you disable hyphenation?
Just add this to the preamble:
Code: Select all
\usepackage[none]{hyphenat}
Greetings from Portugal

Re: Hyphenation of similar words
Hyphenation of similar words
Anyway, I've done a test that may help. I've compiled the following code:
Code: Select all
\documentclass{article}\usepackage[croatian,serbian,english]{babel}\begin{document}\showhyphens{pretpostavka pretpostavke pretpostavku pretpostaviti pretpostavljati}\selectlanguage{serbian}\showhyphens{pretpostavka pretpostavke pretpostavku pretpostaviti pretpostavljati}\selectlanguage{croatian}\showhyphens{pretpostavka pretpostavke pretpostavku pretpostaviti pretpostavljati}\end{document}
Code: Select all
[] \OT1/cmr/m/n/10 pret-postavka pret-postavke pret-postavku pret-postaviti pret-postavl-jati[] \OT1/cmr/m/n/10 pret-po-stavka pret-po-stavke pret-po-stavku pret-po-sta-viti pret-po-sta-vljati[] \OT1/cmr/m/n/10 pret-pos-tavka pret-pos-tavke pret-pos-tavku pret-pos-ta-viti pret-pos-tav-ljati
Re: Hyphenation of similar words
BTW, a little bit off topic, where to look for instructions and procedure about adding new language support for babel? I tried to find babel homepage but without success. Any advice is appreciated.
Hyphenation of similar words
Yes, that's right.meho_r wrote:In other words, if I use croatian hyphenation pattern, for all exception there's only one way: manually input all variants.
To my knowledge, support is provided in two different levels:meho_r wrote:BTW, a little bit off topic, where to look for instructions and procedure about adding new language support for babel? I tried to find babel homepage but without success. Any advice is appreciated.
- hyphenation patterns,
- special macros, name translations, particular layouts...
The file <language>.ldf is generated by typesetting <language>.ins, which strips code from <language>.dtx (located also in texmf-dist/source/generic/babel). A new run of TeX on <language>.dtx yields the documentation corresponding to that language. So, remember, first the ins file, then the dtx file. In the attached zip file, I provided bosnian.ins and bosnian.dtx, which come from suitable changes in the corresponding croatian files. For your convenience, I've generated bosnian.ldf and the documentation (bosnian.pdf). For completeness, I also provide bosnian.sty and a simple test file. You'll see that I've tried to localise the \today macro (I've searched the name of the months in an on-line dictionary). Until bosnian hyphenation patterns could be available, the Bosnian language uses the croatian ones (that's the \let\l@bosnian\l@croatian command near the beginning of bosnian.ldf).
I hope that you can continue improving bosnian.dtx. You need to understand the particular syntax used there. Look at this tutorial. But you need even more to understand a bit the babel package.
Now, let's go with (a). Hyphenation patterns are defined in the file hyph-<language abbreviation>.tex, located at texmf-dist/tex/generic/hyph-utf8/patterns. The meaning of each patterns and the hyphenation algorithm is explained in Appendix H of The TeXbook. Some configuration files are also required. To adapt all that for Bosnian is really a hard job. And once adapted, a new LaTeX format is needed so the patterns come usable. You should ask for help: perhaps in comp.text.tex, in bosnian institutions, contributors to hyphenation patterns in other languages... I don't know. In the meantime, I think you may seriously perform extensive tests (with the help of \showhyphens) to see if the actual croatian patterns are really best suited to Bosnian than the serbian ones. If you compare hyph-hr.tex (croatian) and hyph-sh-latn.tex (serbocroatian), it seems that the latter contains more patterns, including groups of four or more letters, while the former considers at most four letters, so being less complete and more error prone. If you are finally convinced that hyph-sh-latn.tex could be a better startpoint for the bosnian patterns, I would recommend to change \let\l@bosnian\l@croatian to \let\l@bosnian\l@serbian in bosnian.dtx (so in bosnian.ldf).
I hope all this is really of some help for you.
- Attachments
-
- bosnian.zip
- (57.12 KiB) Downloaded 170 times
Re: Hyphenation of similar words

In fact, I tried to make something similar in the past, but couldn't figure out relation between files and what files are needed for language to work properly. I changed every file in the texlive folder that has "croatian" or "hr" in their names

As I've concluded, the path for those four files is:
1. for .ins and .dtx files: <texlive>/texmf-dist/source/generic/babel
2. for .ldf and .sty files: <texlive>/texmf-dist/tex/generic/babel
I tried it and works perfectly. BTW, you've done terrific work making changes in .dtx file. I'm amazed:)
I will try to do something in this regard in the future. I'll try to contact some institutions although I don't expect much except maybe from our LUG and their localization team. And, if we manage to make hyphenation pattern it can be used in OpenOffice.org too (as I was informed) so many will benefit from it

Again, thank you very, very much
