General ⇒ Hyphenation of similar words
Hyphenation of similar words
In my language (Bosnian) there are many similar words that should have similar hyphenation break points, but it's a hard job putting all of them in \hyphenation{} in the preamble. For example, consider these words:
"pretpostavka", "pretpostavke", "pretpostavku", "pretpostaviti", "pretpostavljati" etc.
I would like to set this scheme: "pret-po-stav" to apply to all variants of this word (common part) regardless of the suffix. Is it possible? Is there some kind of "regex" that can be used in \hyphenation for this situation (something like: "pret-po-stav*" where "*" will replace suffixes)?
"pretpostavka", "pretpostavke", "pretpostavku", "pretpostaviti", "pretpostavljati" etc.
I would like to set this scheme: "pret-po-stav" to apply to all variants of this word (common part) regardless of the suffix. Is it possible? Is there some kind of "regex" that can be used in \hyphenation for this situation (something like: "pret-po-stav*" where "*" will replace suffixes)?
NEW: TikZ book now 40% off at Amazon.com for a short time.
- black-wolf
- Posts: 7
- Joined: Wed Jul 02, 2008 10:58 pm
Hyphenation of similar words
Hello,
Why don't you disable hyphenation?
Just add this to the preamble:
LaTeX will do justification as usual but no hyphenation. I use this by default on my documents.
Greetings from Portugal
Why don't you disable hyphenation?
Just add this to the preamble:
Code: Select all
\usepackage[none]{hyphenat}
Greetings from Portugal

Re: Hyphenation of similar words
Thanks, but that's not acceptable solution. In most cases hyphenation works correctly (croatian hyphenation scheme). But there are some words that need correction.
Hyphenation of similar words
What you are really searching for are hyphenation patterns for the bosnian language. To my knowledge, they are not actually implemented. TeX has a macro called \pattern to add them. But this macro can be only used by INITEX, that is, when TeX is dumping some format (like LaTeX). Too difficult for a normal user. One has to bound oneself to \hyphenation in order to build a list of exceptions.
Anyway, I've done a test that may help. I've compiled the following code:
There is no output. It doesn't matter, since the important things are in the log file. The \showhyphens command writes there the positions where hyphens could be placed. I copy the relevant lines:
These lines shows hyphens when english, serbian and croatian are the active languages. It is clear that english patterns are not valid. However, for the words considered here, it seems that serbian patterns fit better than the croatian ones. You may perform more extensive tests and see if it is convenient for you to switch to serbian or any other language in your geographical area.
Anyway, I've done a test that may help. I've compiled the following code:
Code: Select all
\documentclass{article}
\usepackage[croatian,serbian,english]{babel}
\begin{document}
\showhyphens{pretpostavka pretpostavke pretpostavku pretpostaviti pretpostavljati}
\selectlanguage{serbian}
\showhyphens{pretpostavka pretpostavke pretpostavku pretpostaviti pretpostavljati}
\selectlanguage{croatian}
\showhyphens{pretpostavka pretpostavke pretpostavku pretpostaviti pretpostavljati}
\end{document}
Code: Select all
[] \OT1/cmr/m/n/10 pret-postavka pret-postavke pret-postavku pret-postaviti pre
t-postavl-jati
[] \OT1/cmr/m/n/10 pret-po-stavka pret-po-stavke pret-po-stavku pret-po-sta-vit
i pret-po-sta-vljati
[] \OT1/cmr/m/n/10 pret-pos-tavka pret-pos-tavke pret-pos-tavku pret-pos-ta-vit
i pret-pos-tav-ljati
The CTAN lion is an artwork by Duane Bibby. Courtesy of www.ctan.org.
Re: Hyphenation of similar words
In other words, if I use croatian hyphenation pattern, for all exception there's only one way: manually input all variants. Well... OK then. Thank you very much for your replies.
BTW, a little bit off topic, where to look for instructions and procedure about adding new language support for babel? I tried to find babel homepage but without success. Any advice is appreciated.
BTW, a little bit off topic, where to look for instructions and procedure about adding new language support for babel? I tried to find babel homepage but without success. Any advice is appreciated.
Hyphenation of similar words
Yes, that's right.meho_r wrote:In other words, if I use croatian hyphenation pattern, for all exception there's only one way: manually input all variants.
To my knowledge, support is provided in two different levels:meho_r wrote:BTW, a little bit off topic, where to look for instructions and procedure about adding new language support for babel? I tried to find babel homepage but without success. Any advice is appreciated.
- hyphenation patterns,
- special macros, name translations, particular layouts...
The file <language>.ldf is generated by typesetting <language>.ins, which strips code from <language>.dtx (located also in texmf-dist/source/generic/babel). A new run of TeX on <language>.dtx yields the documentation corresponding to that language. So, remember, first the ins file, then the dtx file. In the attached zip file, I provided bosnian.ins and bosnian.dtx, which come from suitable changes in the corresponding croatian files. For your convenience, I've generated bosnian.ldf and the documentation (bosnian.pdf). For completeness, I also provide bosnian.sty and a simple test file. You'll see that I've tried to localise the \today macro (I've searched the name of the months in an on-line dictionary). Until bosnian hyphenation patterns could be available, the Bosnian language uses the croatian ones (that's the \let\l@bosnian\l@croatian command near the beginning of bosnian.ldf).
I hope that you can continue improving bosnian.dtx. You need to understand the particular syntax used there. Look at this tutorial. But you need even more to understand a bit the babel package.
Now, let's go with (a). Hyphenation patterns are defined in the file hyph-<language abbreviation>.tex, located at texmf-dist/tex/generic/hyph-utf8/patterns. The meaning of each patterns and the hyphenation algorithm is explained in Appendix H of The TeXbook. Some configuration files are also required. To adapt all that for Bosnian is really a hard job. And once adapted, a new LaTeX format is needed so the patterns come usable. You should ask for help: perhaps in comp.text.tex, in bosnian institutions, contributors to hyphenation patterns in other languages... I don't know. In the meantime, I think you may seriously perform extensive tests (with the help of \showhyphens) to see if the actual croatian patterns are really best suited to Bosnian than the serbian ones. If you compare hyph-hr.tex (croatian) and hyph-sh-latn.tex (serbocroatian), it seems that the latter contains more patterns, including groups of four or more letters, while the former considers at most four letters, so being less complete and more error prone. If you are finally convinced that hyph-sh-latn.tex could be a better startpoint for the bosnian patterns, I would recommend to change \let\l@bosnian\l@croatian to \let\l@bosnian\l@serbian in bosnian.dtx (so in bosnian.ldf).
I hope all this is really of some help for you.
- Attachments
-
- bosnian.zip
- (57.12 KiB) Downloaded 169 times
The CTAN lion is an artwork by Duane Bibby. Courtesy of www.ctan.org.
Re: Hyphenation of similar words
Wow, thank you very, very much. This is a great help indeed 
In fact, I tried to make something similar in the past, but couldn't figure out relation between files and what files are needed for language to work properly. I changed every file in the texlive folder that has "croatian" or "hr" in their names
And all that's needed are four files. Ahhh... And, of course, I missed the key point: \let\l@bosnian\l@croatian so hyphenation wasn't right.
As I've concluded, the path for those four files is:
1. for .ins and .dtx files: <texlive>/texmf-dist/source/generic/babel
2. for .ldf and .sty files: <texlive>/texmf-dist/tex/generic/babel
I tried it and works perfectly. BTW, you've done terrific work making changes in .dtx file. I'm amazed:)
I will try to do something in this regard in the future. I'll try to contact some institutions although I don't expect much except maybe from our LUG and their localization team. And, if we manage to make hyphenation pattern it can be used in OpenOffice.org too (as I was informed) so many will benefit from it
Again, thank you very, very much

In fact, I tried to make something similar in the past, but couldn't figure out relation between files and what files are needed for language to work properly. I changed every file in the texlive folder that has "croatian" or "hr" in their names

As I've concluded, the path for those four files is:
1. for .ins and .dtx files: <texlive>/texmf-dist/source/generic/babel
2. for .ldf and .sty files: <texlive>/texmf-dist/tex/generic/babel
I tried it and works perfectly. BTW, you've done terrific work making changes in .dtx file. I'm amazed:)
I will try to do something in this regard in the future. I'll try to contact some institutions although I don't expect much except maybe from our LUG and their localization team. And, if we manage to make hyphenation pattern it can be used in OpenOffice.org too (as I was informed) so many will benefit from it

Again, thank you very, very much
