TeXShopfontenc | Hyphenation for accented Characters in Romanian

Information and discussion about TeXShop, an integrated LaTeX environment for Mac OS X
dusadrian
Posts: 9
Joined: Tue Jul 24, 2012 8:35 am

fontenc | Hyphenation for accented Characters in Romanian

Post by dusadrian »

Hello everybody,

I use TeXShop under Mac OS (Lion), and need to make some hyphenation using special characters in the Romanian language, like: ș and ț
(those would be s and t with comma below, not with cedilla).

The fonts are not in the T1 font encoding, therefore the processing stops with this error: "Improper \hyphenation will be flushed."

In the preamble, the relevant lines are:

Code: Select all

%!TEX TS-program = pdflatexmk
\usepackage[utf8x]{inputenc}
\usepackage[T1]{fontenc}
The .tex files are UTF8 encoded, in order to allow for characters other than ASCII.

Thanks in advance for any tip,
Adrian

Recommended reading 2024:

LaTeXguide.org • LaTeX-Cookbook.net • TikZ.org

NEW: TikZ book now 40% off at Amazon.com for a short time.

User avatar
cgnieder
Site Moderator
Posts: 2000
Joined: Sat Apr 16, 2011 7:27 pm

fontenc | Hyphenation for accented Characters in Romanian

Post by cgnieder »

dusadrian wrote:The fonts are not in the T1 font encoding, therefore the processing stops with this error: "Improper \hyphenation will be flushed."
In this case you should tell fontenc the right encoding instead of using T1, shouldn't you?

Welcome and regards
site moderator & package author
dusadrian
Posts: 9
Joined: Tue Jul 24, 2012 8:35 am

fontenc | Hyphenation for accented Characters in Romanian

Post by dusadrian »

cgnieder wrote:In this case you should tell fontenc the right encoding instead of using T1, shouldn't you?
Yes indeed, and this is exactly the place where I would need a little help...
Already tried different others (e.g. T2A - just to make sure it's not there), with no avail. Does anybody knows which is the right encoding for these characters, to use with the package fontenc?

Alternatively, is there any other way of "convincing" the hyphenation system to work with these characters?

Thanks in advance,
Adrian
User avatar
cgnieder
Site Moderator
Posts: 2000
Joined: Sat Apr 16, 2011 7:27 pm

fontenc | Hyphenation for accented Characters in Romanian

Post by cgnieder »

FWIW: this does not seem like a font encoding problem to me but like an input encoding problem. This works for me:

Code: Select all

\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{combelow}% provides \cb to place comma below character

\usepackage{newunicodechar}
\newunicodechar{ș}{\cb{s}}
\newunicodechar{ț}{\cb{t}}

\begin{document}
\cb{s} and \cb{t}

ș and ț
\end{document}
Regards
site moderator & package author
dusadrian
Posts: 9
Joined: Tue Jul 24, 2012 8:35 am

fontenc | Hyphenation for accented Characters in Romanian

Post by dusadrian »

Yes, that does work... but: as the subject of this topic suggests, it is a hyphenation (not a typesetting) issue.

This (modified) example does not work:

Code: Select all

\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{combelow}% provides \cb to place comma below character

% hypothetical word containing these characters
\hyphenation{blah-șbl-țah}

\usepackage{newunicodechar}
\newunicodechar{ș}{\cb{s}}
\newunicodechar{ț}{\cb{t}}

\begin{document}
\cb{s} and \cb{t}

ș and ț
\end{document}
Upon processing, it throws the following error:

Code: Select all

./test.tex:7: Improper \hyphenation will be flushed.
\GenericError ->\protect 
                         \GenericError  
l.7 \hyphenation{blah-ș
                        bl-țah}
? 
I hope this makes it more clear,
Adrian
User avatar
cgnieder
Site Moderator
Posts: 2000
Joined: Sat Apr 16, 2011 7:27 pm

fontenc | Hyphenation for accented Characters in Romanian

Post by cgnieder »

I'm no expert here but the error message gives more:

Code: Select all

! Improper \hyphenation will be flushed.
\cb ->\protect 
               \cb  
l.12 \hyphenation{blah-ș
                         bl-țah}
Hyphenation exceptions must contain only letters
and hyphens.
\newunicodechar defines ș to be \cb{s} and that seems to be the problem. The TeX FAQ suggests using a font that actually contains the character.

Maybe someone else knows more.

Best regards
site moderator & package author
dusadrian
Posts: 9
Joined: Tue Jul 24, 2012 8:35 am

fontenc | Hyphenation for accented Characters in Romanian

Post by dusadrian »

Using \newunicodechar is interesting, but that seems not to be the problem...

Code: Select all

\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage[utf8x]{inputenc}

% hypothetical word containing these characters
\hyphenation{blah-șbl-țah}

\begin{document}
ș and ț
\end{document}
(note the usage of [utf8x] for the inputenc package)

Without the hyphenation line the above code does work, therefore I conclude it's the hyphenation (which should work in conjunction with the fontenc package), that is the culprit...

Thanks,
Adrian
User avatar
cgnieder
Site Moderator
Posts: 2000
Joined: Sat Apr 16, 2011 7:27 pm

fontenc | Hyphenation for accented Characters in Romanian

Post by cgnieder »

Without newunicodechar I'm getting nowhere. Then there's the error

Code: Select all

! Package inputenc Error: Unicode char \u8:ș not set up for use with LaTeX.
So it's no wonder that in this case hyphenation doesn't work. But if you have a font that contains the characters everything should work.

Alternatively -- I don't know why I didn't think of that earlier -- XeLaTeX seems to work nicely:

Code: Select all

% arara: xelatex
\documentclass{article}
\usepackage{fontspec}

\hyphenation{blah-șbl-țah}
\begin{document}

ș and ț and Ș and Ț

\end{document}
site moderator & package author
dusadrian
Posts: 9
Joined: Tue Jul 24, 2012 8:35 am

fontenc | Hyphenation for accented Characters in Romanian

Post by dusadrian »

Oh, I know about that error... but that has nothing to do with the fontenc package. As the error itself states, it relates with the inputenc package.

That error dissapears by using:

Code: Select all

\usepackage[utf8x]{inputenc}
As previously mentioned, "please note the usage of utf8x..." (the key is the last letter "x" after the utf8, which allows for an extended list of characters.

The XeLaTeX typesetter is fine, except for a single problem that I haven't been able to solve: it cannot correctly process a .bib file in order to make use of the bibliographical references. That is a different issue, in a different thread, but please believe me that I do have a small reproducible example in order to demonstrate what I just wrote (EDIT: I just started a new thread here).

Returning to hyphenation (the subject line), that seems to have very little (I'd say nothing) to do with the inputenc package, for which the problem is already solved(!) by using the utf8x encoding.

Rather, it seems to have a lot to do with how fontenc package treats the special characters, or there's another trick that I am not aware of... and unfortunately I'm back to the square one.

Perhaps I should try to convince XeLaTeX to deal with the .bib file instead (that would be another solution indeed), in any case this hyphenation issue is rather weird.

Thanks again for your patience,
Adrian
User avatar
localghost
Site Moderator
Posts: 9202
Joined: Fri Feb 02, 2007 12:06 pm

fontenc | Hyphenation for accented Characters in Romanian

Post by localghost »

dusadrian wrote:[…] The XeLaTeX typesetter is fine, except for a single problem that I haven't been able to solve: it cannot correctly process a .bib file in order to make use of the bibliographical references. That is a different issue, in a different thread, but please believe me that I do have a small reproducible example in order to demonstrate what I just wrote (EDIT: I just started a new thread here).[…]
The typesetting engine is not responsible for the problem with BibTeX. I answered correspondingly in the other thread.


Thorsten
Post Reply