I have to submit an assignment and the teacher requires me to submit in Microsoft Word format, but I'm having issues so I hope that someone can help me.
Here is a simplified sample document:
\documentclass{article} \usepackage[T1]{fontenc} \usepackage[utf8]{inputenc} \usepackage{CJKutf8} \usepackage{ruby} \begin{document} English ligature: fish Japanese: \begin{CJK}{UTF8}{min} \ruby{吾輩}{わがはい}は\ruby{猫}{ねこ}である。\ruby{名前}{なまへ}は% まだ\ruby{無}{な}い。 \end{CJK} Japanese Hepburn: Doko de umareta ka ton to kentō ga tsukanu. \end{document}
I tried two conversion tools.
1: pandoc -o foo.docx foo.tex
Doesn't support the CJK enviroment correctly. The options UTF8 and min are not understood but are inserted as text where I used them. Also, \ruby{foo}{bar} is removed. This is what it looks like:
English ligature: fish Japanese: UTF8min はである。はまだい。 Japanese Hepburn: Doko de umareta ka ton to kentō ga tsukanu.
2: pdflatex foo.tex, then open in Libreoffice Writer.
Latex does a lot of things and tells the underlying Tex to typeset lots of hboxes. Libreoffice then displays each hbox as a separate textbox. Virtually impossible to edit a document when it is typeset like this (not my problem). I like it that line breaks are exactly as in the PDF, so this seems to be the best method so far. Hopefully it also looks correct in Microsoft Word if saved as docx (I have no way of checking).
The ligature in fish is removed in Libreoffice, and ō is changed to o, probably because there's no pre-composed ō in T1.
If I instead copy & paste from Evince to Gedit, the ligature in fish becomes U+001C, and ō becomes <space>o. Why do I get a different behaviour in Gedit?
If I add \usepackage{lmodern}, then the ligature in "fish" works in both Libreoffice and when copying and pasting to Gedit. The ō works when copying and pasting to Gedit, but Libreoffice shows "¯o". Any way to fix this? Alternatively, any way to use "replace all"? Search and replace in Libreoffice apparently doesn't search through text boxes.