XeTeXSearchable ligatures in PDF docs from XeTeX

Information and discussion about XeTeX, an alternative for pdfTeX based on e-Tex
Post Reply
GhostDragon
Posts: 2
Joined: Wed Oct 31, 2012 4:55 pm

Searchable ligatures in PDF docs from XeTeX

Post by GhostDragon »

Hello. I'm trying to figure out why XeTeX is producing PDFs in which ligatures are not searchable as expected even when I use OpenFonts that have proper ligatures. E.g. "The fine boy" produces a ligature on 'Th' and 'fi' (which I do want) but when you search the document in Acrobat or Sumatra or OS X Preview or any other reader it won't find these words. Using Libertine font package seems to fix some ligatures but not all, and I want to use other fonts like Adobe Text Pro that are known to be well specified fonts. The documents look fine, they just don't search and export text as expected.

When I use LuaTeX to compile the document (see below) the search function works as expected even with Adobe Text Pro and ligatures. So I'm pretty confident it's not a problem with the font.

I would just use LuaLaTex but it's much, much slower for me. For now I am doing my composition and previewing with XeTeX and then producing my final output with LuaTeX but I'm wondering if I'm just missing the right settings for XeTex.

Example document:

Code: Select all

\documentclass{book}
\usepackage{fontspec}
\usepackage{libertineotf}
%\setmainfont[Ligatures=TeX]{Adobe Text Pro}
\begin{document}
The quote is: ``Play in the field; riffle the deck.''
\end{document}
Compiled with XeTex with the Libertine font enabled as above, "field" and "riffle" have ligatures and are searchable but the initial "The" is not. With libertine commented out and Adobe Text Pro commented in, none of the ligatures are searchable at all.

Then compile the above document with LuaTeX and everything is properly searchable whether Libertine or Adobe Text Pro is enabled. I take it that the Th ligature issue with Libertine is due to lack of a precomposed Th character in Unicode, but then how does LuaTex manage it? Does XeTeX not support the alternate text feature of PDFs?

I'm using TeXStudio 2.5 on both Windows (with latest MiKTeX) and Mac OS X (MacTex-2012).

Recommended reading 2024:

LaTeXguide.org • LaTeX-Cookbook.net • TikZ.org
LaTeX Beginner's Guide LaTeX Cookbook LaTeX TikZ graphics TikZによるLaTeXグラフィックス
meho_r
Posts: 823
Joined: Tue Aug 07, 2007 5:28 pm

Searchable ligatures in PDF docs from XeTeX

Post by meho_r »

Same issue which has been bothering me for a long time. It seems that it is (mainly) because of the font design. E.g., if you compile your example with exljbris's Calluna or Typetogether's Athelas, to name just two high-quality fonts, ligatures are searchable as expected (and text figures /oldstyle numbers/ too, since they are default in these fonts). But, for small caps, you'll probably have to use a custom command and hyperref's \texorpdfstring.

For more detailed infos regarding this issue, there is an ongoing discussion on XeTeX mailing list which you might find useful.
GhostDragon
Posts: 2
Joined: Wed Oct 31, 2012 4:55 pm

Re: Searchable ligatures in PDF docs from XeTeX

Post by GhostDragon »

Well, that's definitely an interesting and relevant link, so thanks for posting it. I read over the thread and don't see any obvious answers or even any solid agreement on what the problem is. All I can conclude for now is that LuaTeX does something magical to make this work, so I assume it must be possible on some level. I'll read over the thread in more detail later and post there if I can contribute to the discussion at all. Thanks!
Post Reply