Document ClassesProblems with seemingly unformatted Greek text

Information and discussion about specific document classes and how to create your own document classes.
yakiv
Posts: 10
Joined: Fri Oct 01, 2010 2:02 pm

Problems with seemingly unformatted Greek text

Post by yakiv »

I am totally new to latex and the use of tools like MiKTeX and TeX Live. I have, over the past week or so, become more familiar with basic document code.

I have obtained a copy of the Greek NT, in Babel. It apparently is nicely formatted for each book (bibbook) and each verse (vs). There is also, apparently, footnotes (fn), which I am not certain how they are used either. The document that I have did not come with the proper code, such as this:

Code: Select all

\documentclass{article}
\usepackage[greek]{babel}
And it was not wrapped with

Code: Select all

\begin{document}
and

Code: Select all

\end{document}
Apparently bibbook needs to be defined in some way and lots of other parameters need to be set? I am gathering that it is a matter of defining my own classes? Or is this wrong?

I hope for some direction from the experienced LaTeX experts.

Here is a sample of the document that I have:

Code: Select all

\bibbook{TO KATA MATJAION\\AGION EUAGGELION}{KATA MATJAION}{Matthew}{Mat}[\textgreek{KATA MATJAION}]
\vs Mat 1:1 B'ibloc gen'esewc >Ihso~u Qristo~u, u<io~u Dab`id, u<io~u >Abra'am.
\fn{b'ibloc\gram{nnfs} g'enesic\gram{ngfs} >Ihso~uc\gram{ngms} Qrist'oc\gram{ngms} u<i'oc\gram{ngms} Dau'id\gram{tp} u<i'oc\gram{ngms} >Abra'am\gram{tp}.}
Ultimately, I would love to be able to produce a normal PDF with bookmarking. But, my main goal is to have a *.txt file with each verse on it's own line, in Greek font, with all of the proper accents, etc., in UTF-8. That's really the primary goal I am trying to achieve at the moment.

...I do see that XeLaTeX outputs PDFs that contain Greek in actual Greek font, that can be copied and pasted out. However, if there is an easier way to go straight to .txt, that would be amazing.

Recommended reading 2024:

LaTeXguide.org • LaTeX-Cookbook.net • TikZ.org

NEW: TikZ book now 40% off at Amazon.com for a short time.

magister
Posts: 17
Joined: Sun Aug 02, 2009 4:42 am

Re: Problems with seemingly unformatted Greek text

Post by magister »

The text you have is not UTF-8 (Unicode); it's in some earlier encoding. I use only Unicode Greek, so I can't be more specific. If what you want is a .txt file in Unicode with each verse on a separate line, you would be better off to find a Unicode Greek file you can download. See, for instance, http://www.ntgateway.com/greek-ntgateway/fonts/; scroll down a little ways to the section on copying and pasting text. Trying to re-encode the file you would be time-consuming and error-prone. Just take the Unicode Greek text and add line breaks as needed, using any Unicode-based editor or word processor and save as UTF-8. LaTeX is not necessary.

If you do want to work with Unicode text in LaTeX, you should use XeTeX, which is a Unicode-based version of TeX. Use the polyglossia package, which replaces babel, along with fontspec to choose fonts.

David
yakiv
Posts: 10
Joined: Fri Oct 01, 2010 2:02 pm

Re: Problems with seemingly unformatted Greek text

Post by yakiv »

@magister, thank you for your reply.

I am aware that what I have is not UTF-8. I was told that it is in "standard babel."

I did view the link you gave. Unfortunately, the links are not going to fulfill the purpose I have. Additionally, they are not the same Greek text, from which I am working.

I do have UTF-8 .txt files of the Greek NT. However, it is not sufficient for me. Either it is the text that I don't want, and has all of the accents and marks, or, it is the text that I do want, but does not have any accents and marks. Thus, I went in search for a way to produce this. And I came across a source for what I want, but it is just in "standard babel."

Thank you also for bringing up the XeLaTeX. I did figure out that this output would give me actual UTF-8. I did not know (yet) about the rest though, so I will check into the polyglossia package, which replaces babel, along with fontspec to choose fonts. Very much appreciated info! :D

...My question is, can someone help me with a way to create classes (I think this is what I am in search of, if not, I am hoping someone will redirect me in the right direction), for what I have, since it apparently has classes.

I am also not totally sure what was intended for all of the "footnotes" for each verse and how that would have been done in the output.

Basically, the snippet I gave is sufficient for someone to help me, since the codes are basically repetitive throughout the rest of the document anyway. (Obviously the text itself is not, but the codes are.)

So, at the beginning of each book, you would find the same "bibbook". For each verse, you would see "vs". And each verse has the "fn" line, which I was told means "footnote", but how to use that, I don't know. I haven't been told.

Thanks everyone, for your patience in reading this and helping! It's a cool journey I am on, with this LaTeX (XeLaTeX) project, I think!
User avatar
localghost
Site Moderator
Posts: 9202
Joined: Fri Feb 02, 2007 12:06 pm

Re: Problems with seemingly unformatted Greek text

Post by localghost »

To me it seems as if you have been given some files which have to be included by a central master file. I suggest to consult the source of these files for further information.

I see some LaTeX markup in the code snippet you provided. But the commands seem not to be from vanilla LaTeX so they must have been defined somewhere else.


Thorsten
yakiv
Posts: 10
Joined: Fri Oct 01, 2010 2:02 pm

Re: Problems with seemingly unformatted Greek text

Post by yakiv »

@localghost - I agree with your assessment. The person who provided the text to me apparently has some kind of (as you called it, central master file), which they claimed is somewhere between 1 and 2 MB in size and they are not willing to provide it because it was something that their team developed. This is unfortunate (for me!), because I do not have a way to just output results quickly and easily. However, I was just grateful to have this text that could give me the Greek with accents, markings, etc.! ...I guess maybe everyone has been where I am, just happy to be 1000 times closer to the goal than you were before.

I thought about just taking the whole thing and redoing the LaTeX markup myself. I would at least get the Greek output that I need (in UTF-8) and then could work from there. But, if there was a simple mapping that could be done, based on another central master file (or as I was originally assuming, a set of classes to interpret what's already there), then I could get further, faster.

So yes, I believe you are 100% right, that the commands are basically LaTeX markup but not vanilla LaTeX.

Do you know of any (freely available) master file, that I could take and use (freely) and convert the present codes in my file to, based on what is in the (freely available) master file?

In other words, maybe someone offers a (free to use) central master file that was designed for outputting a large book that is made up of chapters and sections and subsections. Then I would adapt their master file to fit my Book > Books > Chapter > Verses, so that ultimately I get an output which is virtually the same as what I need. At face value, it didn't look the same; but from a logical standpoint, it is the same. I just adapt and am able to get done what needs to get done.

The only other thing I really need insight into is about these so-called "footnotes."

So, #1, Generally speaking, do you know of any freely available (free for use) central master files that I could download? And, #2, Specifically speaking, do you know of a place where I could get any freely available (free for use) central master files that would fit this use case and are downloadable?

Thanks for your help! I think we are getting somewhere! :D
User avatar
localghost
Site Moderator
Posts: 9202
Joined: Fri Feb 02, 2007 12:06 pm

Problems with seemingly unformatted Greek text

Post by localghost »

yakiv wrote:[…] The person who provided the text to me apparently has some kind of (as you called it, central master file), which they claimed is somewhere between 1 and 2 MB in size and they are not willing to provide it because it was something that their team developed. […]
A master file of this size would mean that it contains several ten thousands(!) lines of code (if not a hundred thousand). But even if they don't want to publish their whole work, they should not just leave you so light-minded with the pieces you got. This way I'm afraid you're at a loss.
yakiv wrote:[…] I thought about just taking the whole thing and redoing the LaTeX markup myself. I would at least get the Greek output that I need (in UTF-8) and then could work from there. But, if there was a simple mapping that could be done, based on another central master file (or as I was originally assuming, a set of classes to interpret what's already there), then I could get further, faster. […]
That's what it will come to. Some kind of reconstruction. Getting a document in Greek is not a big deal. I bother more about formatting the whole document. You will have to define commands similar to that which can be seen in the code above. A sample of the final output would be very helpful. Otherwise I see no point where to start.
yakiv wrote:[…] Do you know of any (freely available) master file, that I could take and use (freely) and convert the present codes in my file to, based on what is in the (freely available) master file? […]
I'm afraid such a file doesn't exist. Think about the time (and perhaps money) the team invested in the development of this huge master file. Similar stuff will be hard to find.
yakiv wrote:[…] In other words, maybe someone offers a (free to use) central master file that was designed for outputting a large book that is made up of chapters and sections and subsections. Then I would adapt their master file to fit my Book > Books > Chapter > Verses, so that ultimately I get an output which is virtually the same as what I need. At face value, it didn't look the same; but from a logical standpoint, it is the same. I just adapt and am able to get done what needs to get done. […]
Building a master file is in principal not the difficulty. But as already mention, we have no starting point, no base. We need at least something that lets us see how the document looks like, its structure, its layout and so on.
yakiv wrote:[…] The only other thing I really need insight into is about these so-called "footnotes." […]
Also feasible, but depends similar as the things mentioned above.
yakiv wrote:[…] So, #1, Generally speaking, do you know of any freely available (free for use) central master files that I could download? And, #2, Specifically speaking, do you know of a place where I could get any freely available (free for use) central master files that would fit this use case and are downloadable? […]
I have to negate both points. The only possibility I see is to build a master file from scratch. But without the above mentioned essential information I'm very pessimistic.
yakiv
Posts: 10
Joined: Fri Oct 01, 2010 2:02 pm

Re: Problems with seemingly unformatted Greek text

Post by yakiv »

The past couple days, I made major progress with LaTeX and the project I am working on. I am really getting enjoying it!

I have been able to produce a basic document that I will be able to extract the text from. I basically used several sophisticated regex find and replace commands to strip out what I didn't need for my project. I gave the document a basic structure and I am on my way to a successful project, with what I actually need to do.

There is only one issue that I am researching right now and not sure how to fix. Basically, the PDF is produced and all of the lines look fine in the document. However, when I copy and paste the text into notepad, the lines (past a certain length) are cut off. This is probably "normal" for LaTeX, but how do I get it to not do that? I need the lines "whole" and not hard wrapped.

Once I resolve that last issue, I would say that this thread is "solved".
User avatar
frabjous
Posts: 2064
Joined: Fri Mar 06, 2009 12:20 am

Re: Problems with seemingly unformatted Greek text

Post by frabjous »

A PDF file is basically just a map containing the exact position of various characters and other elements. Typically it doesn't even keep track of where one word ends and another begins, much less the difference between hard-wrapped and soft-wrapped text. Typically, when you copy and paste from a PDF, it's up to the "artificial intelligence" of the PDF viewer to determine how to interpret the text from the locations alone. Different PDF software may be better or worse at doing this.

Recently, however, some new extensions to the PDF format called "tagged PDF" has been created that might allow PDFs to store information about lines and paragraph breaks, etc.; there have been some initial work done in trying to make use of this technology for pdflatex, but I think it's still in its infancy, so as of now, there's no way to implement this. I'm not sure many PDF viewers know what to do with the tags yet anyway.

How is it that you "need" soft-wrapped text, however? Why create beautifully typeset text only to copy it somewhere else? Wouldn't there be a better way to handle this?
yakiv
Posts: 10
Joined: Fri Oct 01, 2010 2:02 pm

Re: Problems with seemingly unformatted Greek text

Post by yakiv »

@frabjous - thanks for your informative reply!

...Basically, the only format that I have the whole text in is "standard babel" Greek. So, in order to get the whole text in UTF-8, I have been working with LaTeX, specifically XeLaTeX to get it "converted" for my needs.

So, since I don't know of any way to convert it from babel to UTF-8 .txt (rather than PDF), I am trying to produce the right result in the PDF, where I will be able to copy and paste out.

So far, I am 99% successful in getting the right result, as far as the Greek text itself. But, I am having this additional problem now, when I copy the text from the PDF and try to paste it into notepad, I have these hard breaks in the text. One line could be on three or four lines now. I don't want to have to "fix" this (manually or programmatically) from the text file if there is a simple setting that can be used in my .tex file.

...I am having a lot of fun learning the LaTeX. I am actually formatting the document more and more, as I go along. I even put bookmarks in and a title page.

...Normally, when I make a document in Word and print to PDF and want to copy text out and paste it, the text does not have these hard breaks. So, I think it is a formatting or setting issue in LaTeX. I hope maybe someone can offer a setting.
yakiv
Posts: 10
Joined: Fri Oct 01, 2010 2:02 pm

Re: Problems with seemingly unformatted Greek text

Post by yakiv »

If anyone else has some insight into the line breaks (and getting LaTeX to not break the lines in the output), I would be very appreciative! :)
Post Reply