Fonts & Character Setscut and pasting accented characters from pdf file

Information and discussion about fonts and character sets (e.g. how to use language specific characters)
novicedude
Posts: 16
Joined: Tue May 25, 2010 10:38 pm

cut and pasting accented characters from pdf file

Post by novicedude »

I am using TeXnicCenter running on windows to generate pdf files from my LaTeX files. I have the following lines in my LaTeX file

Code: Select all

\pdfglyphtounicode{ff}{0066 0066}
\pdfglyphtounicode{fi}{0066 0069}
\pdfglyphtounicode{fl}{0066 006C}
\pdfglyphtounicode{ffi}{0066 0066 0069}
\pdfglyphtounicode{ffl}{0066 0066 006C}
\pdfglyphtounicode{IJ}{0049 004A}
\pdfglyphtounicode{ij}{0069 006A}
\pdfglyphtounicode{quoteleft}{0027}
\pdfglyphtounicode{quotedblleft}{0022}
That correct ligatures and quote symbols so that I can copy them from the pdf file and paste into another file (e.g. Word). I would like to do the same thing with accented characters. For instance, if I have the word café (with the accent on the e), when I copy and paste it from the resulting pdf file, I get caf_e. Are there corresponding \pdfglyphtounicode lines to handle accented characters?

(I'm using the LaTeX string caf\'{e} to generate the café -- is that the most appropriate way, or is there some other trick that would make cutting and pasting more streamlined?)

Thanks for any help you can assist me with.

---edit---
I'm looking for the accented e right now, but it wont be long before I need other accented letters and foreign characters. So, if someone could point me to where there's a whole table of a bunch of them or a way of deciphering how to figure out what they are, that'd be great - thanx again.
Last edited by Stefan Kottwitz on Tue Aug 02, 2011 10:15 am, edited 1 time in total.

Recommended reading 2024:

LaTeXguide.org • LaTeX-Cookbook.net • TikZ.org

NEW: TikZ book now 40% off at Amazon.com for a short time.

And: Currently, Packt sells ebooks for $4.99 each if you buy 5 of their over 1000 ebooks. If you choose only a single one, $9.99. How about combining 3 LaTeX books with Python, gnuplot, mathplotlib, Matlab, ChatGPT or other AI books? Epub and PDF. Bundle (3 books, add more for higher discount): https://packt.link/MDH5p

paul
Posts: 49
Joined: Thu Apr 08, 2010 5:56 am

cut and pasting accented characters from pdf file

Post by paul »

novicedude wrote: \pdfglyphtounicode{IJ}{0049 004A}
\pdfglyphtounicode{ij}{0069 006A}
FYI, you shouldn't be using those characters; they're only in Unicode for compatibility with some pre-Unicode encoding; you should be using "ij" (distinct letters, not a ligature) in any new material you create.
That correct ligatures and quote symbols so that I can copy them from the pdf file and paste into another file (e.g. Word).
Doesn't that just work already? All the characters above (with the possible exception of the "ij" ligature) copy out of PDF files correctly without having to do anything special. The problem I have is with things like the "Th" ligature in certain fonts, that get encoded in the private use area...
(I'm using the LaTeX string caf\'{e} to generate the café -- is that the most appropriate way, or is there some other trick that would make cutting and pasting more streamlined?)
I just put an é in the source file, with the appropriate input encoding...or XeTeX...
User avatar
frabjous
Posts: 2064
Joined: Fri Mar 06, 2009 12:20 am

cut and pasting accented characters from pdf file

Post by frabjous »

Try putting:

Code: Select all

\usepackage[T1]{fontenc}
in your preamble. That should make it possible to copy and paste characters like é from the PDF. (Or use another option to that package more appropriate for the fonts or font packages you're using.)

Whether you use \'{e} or use é (with the appropriate input encoding) in the source shouldn't affect whether or not you can copy and paste the output.
novicedude
Posts: 16
Joined: Tue May 25, 2010 10:38 pm

Re: cut and pasting accented characters from pdf file

Post by novicedude »

I added the \usepackage[T1]{fontenc} it complained about needed a stretchable font (I was using the cmr font). So I went and tried adding the following lines (one at a time)

\usepackage{lmodern}
\usepackage{pslatex}
\usepackage{ae,aecompl}

With the ae,aecompl the cut and paste was the same as it was without any of these lines added (I'd get the caf_e). When I had the pslatex line, the cut and paste worked okay, but the pdf output lost all the ff, ffi, and ffl ligatures. With the lmodern, the pdf file looks ok except when I copy/paste, I lose the accent on the e (I get cafe instead of café).

When trying different variants, I started looking at some of the various fonts out there. If I had my druthers, I'd like to use the Adobe Garamond font. I've got it on my computer (it works in Word). I tried looking around to see how to use it with LaTeX, but the only stuff I could find talked about getting it to work on linux systems. Is it possible (without too much complications) to use these other fonts (like the ones usable from Word) with LaTeX?
User avatar
frabjous
Posts: 2064
Joined: Fri Mar 06, 2009 12:20 am

cut and pasting accented characters from pdf file

Post by frabjous »

What LaTeX distribution and version are you using? Does installing the cm-super package help?
novicedude wrote: When trying different variants, I started looking at some of the various fonts out there. If I had my druthers, I'd like to use the Adobe Garamond font. I've got it on my computer (it works in Word). I tried looking around to see how to use it with LaTeX, but the only stuff I could find talked about getting it to work on linux systems. Is it possible (without too much complications) to use these other fonts (like the ones usable from Word) with LaTeX?
If you use XeLaTeX, using system fonts is a breeze. See, e.g., the fontspec package.

If you need to use pdflatex instead of xelatex, you could use URW Garamond, e.g., through the mathdesign package. URW Garamond may or may not already be installed on your TeX system, depending on which one you're using.
novicedude
Posts: 16
Joined: Tue May 25, 2010 10:38 pm

Re: cut and pasting accented characters from pdf file

Post by novicedude »

I am using TeXnicCenter (version 1.0 rc1) with MikTeX 2.9 and pdfTeX. This is all running on Windows 7x64. I've installed several packages, but I don't think any of them were XeLaTeX. Is that a whole nother program or added via the \usepackage line?
novicedude
Posts: 16
Joined: Tue May 25, 2010 10:38 pm

cut and pasting accented characters from pdf file

Post by novicedude »

Being the adverturous person I am, I put a line in my file
\usepackage{fontspec}

When running it, it started to install fontspec, but then got an error

Code: Select all

starting package maintenance...
installation directory: "C:\Program Files\MiKTeX 2.9"
package repository: http://mirror.hmc.edu/ctan/systems/win32/miktex/tm/packages/
lightweight database digest: 92bae9486d377a62301f73b50c74cea7
going to download 3034328 bytes
going to install 39 file(s) (1 package(s))
downloading http://mirror.hmc.edu/ctan/systems/win32/miktex/tm/packages/expl3.tar.lzma...
pdflatex.EXE: Error response from server: 404
and prompts me to specify where expl3.sty is.

It's hard to understand some of these documents as they all seem to assume that you're running on linux. I'm dumb that way -- sorry (and thanx for bearing with me).
novicedude
Posts: 16
Joined: Tue May 25, 2010 10:38 pm

Re: cut and pasting accented characters from pdf file

Post by novicedude »

I ran it with the mathdesign package. It installed okay, but the font it used in the pdf (adobe listed it as font F15) file looked like doodoo and when I attempted to copy/paste, all the the ligatures were screwed up.

Is XeLaTeX a program that runs instead of pdftex? Does it output pdf files? That's mainly what the file type I need.
User avatar
frabjous
Posts: 2064
Joined: Fri Mar 06, 2009 12:20 am

Re: cut and pasting accented characters from pdf file

Post by frabjous »

Yes, XeLaTeX is a program that you run instead of pdf(la)tex. Yes, it outputs PDF files. You need to use it to use fontspec. (Though the server error is something else probably. Others have reported MikTeX server errors, but I don't know anything about it. I use TeXlive.)
novicedude
Posts: 16
Joined: Tue May 25, 2010 10:38 pm

cut and pasting accented characters from pdf file

Post by novicedude »

Funny... I seem to have gone almost full circle. I managed to get TeXNicCenter/MikTeX and XeLaTeX working. I am running the following through

Code: Select all

\documentclass[]{article}
\usepackage[T1]{fontenc}
\usepackage{xltxtra,fontspec,xunicode}
\defaultfontfeatures{Scale=MatchLowercase}
\title{Font Fun}
\begin{document}
\defaultfontfeatures{Mapping=tex-text}

\thispagestyle{empty}

\newcommand{\testString}{If a word is ``quoted'', will that work, `single' quotes, or typed ``double" quotes? When different fluids fill difficult baffles. What about the café or caf\'{e}?
%{\it italics} {\sc Small Caps Swim}
The quick brown fox jumps over the lazy dog.}

\newcommand{\testFont}[1]{{\setromanfont{#1}\section{#1}\testString}}

\testFont{Adobe Garamond Pro}
\testFont{Palatino Linotype}
\testFont{Freestyle Script}
\testFont{Kunstler Script}
\testFont{Old English Text MT}

\end {document}
All the fonts show up real nice in the PDF file, with the exception of the first café. The é shows up as several different symbols, mostly boxes, depending upon which font I'm using.

Then when I attempt to copy & paste into word, some of the ligatures wouldn't translate correctly. Some of them did. The three letter ligatures (ffi and ffl) translated correctly, but the two letter ones (fi and fl) didn't. With the Garamond font, the Th in The didn't translate, but with the other fonts I used, it translated okay.

When I put the \pdfglyphtounicode commands in, I just got undefined command errors.
Post Reply