GeneralIndexing terms from included PDF file (pdfpages+imakeidx+hyperref)

LaTeX specific issues not fitting into one of the other forums of this category.
Post Reply
LukeDrew
Posts: 4
Joined: Wed Apr 03, 2024 8:50 pm

Indexing terms from included PDF file (pdfpages+imakeidx+hyperref)

Post by LukeDrew »

Hello,

As it is my first topic, I'd like to say hello to the whole community!

I am writing PhD thesis that consists of several chapters and then several articles added using \includepdf command from pdfpages package.

I would like to add some terms from included PDF files to the index, with correct page numbers and working hyperlinks. I want the index to behave as if these PDF files were ordinary content of the whole document. In other words, if page 3 of the docA.pdf file is rendered on the page 15 of the whole thesis and contains term1, the index entry would look like this:

term1, 15

And clicking on page number 15 would move to the respective page in the document (normal imakeidx + hyperref cooperation).

Below is the minimal example (unfortunately it won't compile without the PDF file). I also attach the code with PDF file so it compiles without errors:
pdfpages_with_makeindex_MWE.zip
(744.68 KiB) Downloaded 769 times
In this example:
  • Index terms from regular content are working as expected
  • I was able to add index term linked to the first (included) PDF page using picturecommand* key
  • I was able to add index term linked to the whole PDF page range included, but it is not very useful (picturecommand key)
  • I have no idea how to add index terms linked to a single page other than the first (included) one
Some solutions I considered:
  1. Finding/creating a command that would allow me to define index term in a way similar to \index{} command, but would accept e.g. PDF filename / linkname and page number.
  2. Some command that would allow adding lines directly to *.idx file (in a way similar to \addtocontents command that allows adding stuff to *.toc, *.lof and *.lot files)
  3. Some command that would allow putting \index{} command on a specific page of the final document (somehow computed based on finding the number of the first page plus offset)
Since my knowledge and experience with hacking LaTeX is limited, I would appreciate any help/suggestions and/or solution.

Cheers,
Luke

Code: Select all

\documentclass{book}
\usepackage{lipsum}
\usepackage{pdfpages}
\includepdfset{pages=-,frame,scale=0.65,pagecommand={},link}
\usepackage{imakeidx}
\makeindex
\usepackage{hyperref}


\begin{document}

\tableofcontents

\mainmatter  % reset page number, start numbering chapters using Arabic numerals

\chapter{Introduction}

\lipsum[1-2]\index{lipsum}


\chapter{PDF inclusion}

On the next page I include a PDF \index{PDF} file\ldots

I can link specific PDF page, e.g.: \hyperlink{pdf:gravity.3}{pdf:gravity.3}.

\includepdf[
    linkname=pdf:gravity,
    pages={2-4},
    addtotoc={
        3, section, 1,
        {Third page of the PDF file}, % adding \index{someterm} here results in an error
        sec:GravitySecIII
    },
    addtolist={
        2, figure,
        {Some figure description},  % adding \index{someterm} here results in an error
        fig:GravityFig1
    },
    picturecommand*={
        \put(10,750){Here I can check the page number on which the PDF inclusion starts (page \thepage) and maybe store it in some variable}\index{first PDF page is easy, but what with others?}
    },
    picturecommand={
        \index{page range for the whole PDF is also working, but I want to be specific}
    }
    ]{1602.03837.pdf}  % https://arxiv.org/pdf/1602.03837

\listoffigures

\printindex

\end{document}

Recommended reading 2024:

LaTeXguide.org • LaTeX-Cookbook.net • TikZ.org

NEW: TikZ book now 40% off at Amazon.com for a short time.

User avatar
Stefan Kottwitz
Site Admin
Posts: 10320
Joined: Mon Mar 10, 2008 9:44 pm

Indexing terms from included PDF file (pdfpages+imakeidx+hyperref)

Post by Stefan Kottwitz »

Hi Luke,

welcome to the forum!

I uploaded the PDF file to the online compiler, so your document can be compiled directly here in the forum.

Great that you found a way to link an index entry to the first page! My quick suggestion would be: include the PDFs page by page, then you can always link to a "first page" and it works. You could use a macro or a for-loop for this if you don't like to see many copy&paste \includepdf for each page of included PDFs. Possibly they are not so many anyway.

Stefan
LaTeX.org admin
LukeDrew
Posts: 4
Joined: Wed Apr 03, 2024 8:50 pm

Indexing terms from included PDF file (pdfpages+imakeidx+hyperref)

Post by LukeDrew »

Hi Stefan, thank you for uploading the file and inspiring suggestions!

As to numbers, I have 50+ pdf pages included using \includepdf with 40+ items added using addtotoc option and 70+ figures/tables added to LoF/LoT using addtolist option. I expect a significant amount of index terms to be added. Therefore I would like to create a solution that will be reasonably compact, where page numbers are not repeated where they don't need to be etc.

This probably requires writing some command that would offer similar interface as \includepdf, e.g.:

Code: Select all

\MyIncludePdf[
    linkname=pdf1,
    pages={14-21},
    % list with n*5 elements:
    addtotoc={
        14, section,    1, {Paper Title}, sec:pdf1_title,
        14, subsection, 2, {Section 1},   sec:pdf1_sec1,
        % ...
        21, subsection, 2, {Section 8},   sec:pdf1_sec8,
        21, subsection, 2, {Section 9},   sec:pdf1_sec9
    },
    % list with n*4 elements:
    addtolist={
        14, figure, {Fig.~1. Caption}, fig:pdf1_fig1,
        14, figure, {Fig.~2. Caption}, fig:pdf1_fig2,
        % ...
        21, table,  {Tab.~1. Caption}, fig:pdf1_tab1,
        21, figure, {Fig.~8. Caption}, fig:pdf1_fig8
    },
    % list with n*2 elements:
    mypicturecommands={  % this option is not present in standard \includecommand
        14, {\put(10,50){sometext}\index{someterm}},
        % ...
        21, {\put(10,50){othertext}\index{otherterm}}
    }
]{filename.pdf}
I started implementing it using expl3 syntax. Below is the example, I tried to cut out the non-essential stuff, input checking, etc. to make it reasonably small.

Code: Select all

\documentclass{book}
\usepackage{pdfpages}

\ExplSyntaxOn
\keys_define:nn { mykeys } {
    linkname        .tl_set:N    = \l_linkname_tl,
    pages           .tl_set:N    = \l_pages_tl,
    addtotoc        .clist_set:N = \l_addtotoc_clist,
}

\NewDocumentCommand{\MyIncludePdf}{ O{} m }{
    \group_begin:
    \keys_set:nn { mykeys } { #1 }  % read key-value pairs from the first optional argument
    
    % ==> `pages' option <== %  now only supports format: 1-5
    \seq_new:N \l_page_range_seq  % 2-element page range seq
    \int_new:N \l_first_page_int  % first page number
    \int_new:N \l_last_page_int   % last page number
    \seq_new:N \l_pages_seq       % sequence of all pages between the first and the last
    \seq_set_split:NnV \l_page_range_seq {-} \l_pages_tl
    \int_set:Nn \l_first_page_int {\seq_item:Nn \l_page_range_seq 1}
    \int_set:Nn \l_last_page_int  {\seq_item:Nn \l_page_range_seq 2}
    \int_step_inline:nnn \l_first_page_int \l_last_page_int {
        \seq_put_right:Nn \l_pages_seq {##1}
    }
    Page~numbers~sequence:~\seq_use:Nn \l_pages_seq {,~} \\  % working fine with proper input

    % ==> `addtotoc' option <== texdoc pdfpages, see \includepdf, addtotoc option
    % addtotoc={<page number>,<section>,<level>,<heading>,<label>,<page number>,...}
    % number of elements in a `addtotoc' list should be a multiple of 5
    \prop_new:N \l_per_page_addtotoc_prop  % key: page number, value: clist (subset of addtotoc for a given page)
    \tl_new:N \l_page_number_tl
    \tl_new:N \l_section_tl
    \tl_new:N \l_level_tl
    \tl_new:N \l_heading_tl
    \tl_new:N \l_label_tl
    \bool_until_do:nn {\clist_if_empty_p:N \l_addtotoc_clist} {
        % get 5-element spec from addtotoc list
        \clist_pop:NN \l_addtotoc_clist \l_page_number_tl
        \clist_pop:NN \l_addtotoc_clist \l_section_tl
        \clist_pop:NN \l_addtotoc_clist \l_level_tl
        \clist_pop:NN \l_addtotoc_clist \l_heading_tl
        \clist_pop:NN \l_addtotoc_clist \l_label_tl
        % append 5-element spec to a list for a specific page
        \prop_get:NVNTF \l_per_page_addtotoc_prop \l_page_number_tl \l_tmpa_clist 
            {} {\clist_clear:N \l_tmpa_clist}
        \clist_put_right:NV \l_tmpa_clist \l_page_number_tl
        \clist_put_right:NV \l_tmpa_clist \l_section_tl
        \clist_put_right:NV \l_tmpa_clist \l_level_tl
        \clist_put_right:NV \l_tmpa_clist { { \l_heading_tl } }  % PROBLEM 1. What if \l_heading_tl contains commas and is surrounded with braces? It seems not to be working as documented
        \clist_put_right:NV \l_tmpa_clist \l_label_tl
        \clist_show:N \l_tmpa_clist  % DEBUG (see terminal)
        % put updated spec back in the property list (key is the page number)
        \prop_put:NVV \l_per_page_addtotoc_prop {\l_page_number_tl} {\l_tmpa_clist}
    }
    \prop_show:N \l_per_page_addtotoc_prop  % DEBUG (see terminal)

    % ==> call \includepdf for each page <==
    \prop_new:N \l_includepdf_prop  % property list of key-value pairs to be passed as a first arg to \includepdf (after expansion?)
    \seq_map_inline:Nn \l_pages_seq {  % for each page
        PAGE~NUMBER:~##1 \\
        \prop_clear:N \l_includepdf_prop
        \prop_put:NnV \l_includepdf_prop {linkname} {\l_linkname_tl}
        \prop_put:Nnn \l_includepdf_prop {pages} ##1
        % not all pages will have addtotoc items defined, that's why I build key-val in advance!
        % I cannot pass addtotoc={} to \includepdf, this results in warnings
        \prop_get:NnNTF \l_per_page_addtotoc_prop ##1 \l_tmpa_clist {
            \prop_put:NnV \l_includepdf_prop {addtotoc} \l_tmpa_clist
        }{}
        \prop_show:N \l_includepdf_prop  % DEBUG (see terminal)
        
        % PROBLEM 2. This fails. How to pass a list of key-value pairs as a single variable to \includepdf?
        % \exp_args:Nn \includepdf [\l_includepdf_prop] {#2}  
        % I suppose that \l_includepdf_prop must be either converted to clist or another datatype and expanded
        % but all of my attempts to make it work failed

        % can this help?
        % use \prop_to_keyval and treat as clist or tl? 
        % \clist_set:NV \l_tmpb_clist {\prop_to_keyval:N \l_includepdf_prop}
        % \clist_show:N \l_tmpb_clist  % DEBUG (see terminal) strange output
        % \tl_set:NV \l_tmpb_tl {\prop_to_keyval:N \l_includepdf_prop}
        % \tl_show:N \l_tmpb_tl        % DEBUG (see terminal) strange output
        
    }
    \group_end:
}
\ExplSyntaxOff

% ================================================================== %
\begin{document}
\mainmatter
\MyIncludePdf[
    linkname=pdf:carbon,
    pages={2-4},
    addtotoc={
        2, section,    1, {Paper \textbf{The, title}. Some text, WITH, COMMAS! }, sec:title1, % PROBLEM 1, see line 50
        2, subsection, 2, {Section~1. Title}, sec:sec1,
        4, subsection, 2, {Section~3. Another text}, sec:sec3
    },
    ]{1602.03837.pdf}
\end{document}
I got stuck on two (possibly newbie) problems:

PROBLEM 1. Each addtotoc list element on 5*n+4 position is the header that lands in the table of contents. As such, it may contain commas, so i put it in braces, e.g. {Section~1. Some text, WITH, COMMAS!} in the example below. In the line 42 in the code, I pop this heading (\l_heading_tl) from one comma-separated list (clist type) and in the line 50 I put it on the right of another clist. When I list the contents of that another list (line 52), I see the following:

Code: Select all

The comma list \l_tmpa_clist contains the items (without outer braces):
>  {2}
>  {section}
>  {1}
>  {Paper \textbf {The, title}. Some text}
>  {WITH}
>  {COMMAS!}
>  {sec:title1}.
But I would expect:

Code: Select all

The comma list \l_tmpa_clist contains the items (without outer braces):
>  {2}
>  {section}
>  {1}
>  {Paper \textbf {The, title}. Some text, WITH, COMMAS!}
>  {sec:title1}.
I suppose something is wrong with the way I put the heading token list back to the clist in the line 50:

Code: Select all

\clist_put_right:NV \l_tmpa_clist { { \l_heading_tl } }
I tried to adhere to texdoc interface3, page 185 in the PDF (TL2023):
To append some ⟨tokens⟩ as a single ⟨item⟩ even if the ⟨tokens⟩ contain commas or spaces, add a set of braces: \clist_put_right:Nn ⟨clist var⟩ { {⟨tokens⟩} }.
I do not know what am I doing wrong, but the extra set of braces does not change anything.

PROBLEM 2. Assuming that I would call \includepdf page by page, not all pages will use addtotoc etc. I cannot pass addtotoc={} to those that do not, this results in warnings. Therefore I decided to build a list of key-value pairs using prop datatype and then somehow pass it as a first optional argument to the \includepdf for each PDF page separately. This does not work and after trial-and-error I still do not know how to select data type and expansion type to make it work (some rumors about incompatibility between LaTeX2, pgfkeys and expl3 key-value interfaces also make me confused). The code related to this problem is in lines 72-82 and is now commented.

I would appreciate any tips how to solve this.
LukeDrew
Posts: 4
Joined: Wed Apr 03, 2024 8:50 pm

Indexing terms from included PDF file (pdfpages+imakeidx+hyperref)

Post by LukeDrew »

I finished the code that solves the problem. I put it below - maybe it will be useful to someone in the future. May require some adjustment, since only a tiny subset of \includepdf options is wrapped. Thank you once more for the inspiration.

pdfpw.sty:

Code: Select all

\newcommand{\ExplFileName}{pdfpw}  % see texdoc interface3, \GetIdInfo

\ProvidesExplPackage{\ExplFileName}{2024-04-04}{0.0.1}{
    pdfpages \\includepdf wrapper with per-page picturecommand
}

\RequirePackage{pdfpages}


% ============================================================================ %
% Auxiliary functions
% ============================================================================ %

% https://tex.stackexchange.com/questions/572885/how-do-i-add-the-value-of-a-token-list-with-braces-in-latex
\cs_new_protected:Nn \__pdfpw_tl_put_right_braced:Nn {
    \tl_put_right:Nn #1 { { #2 } }
}
\cs_generate_variant:Nn \__pdfpw_tl_put_right_braced:Nn { NV, cV, cv, Nx, cx }


\cs_new_protected:Npn \__pdfpw_check_clist_length:Nnn #1#2#3 {
    % #1 - clist
    % #2 - option name
    % #3 - multiple required
    % throw an error if number of elements in a clist is not a multiple of a given integer
    \group_begin:
    \int_set:Nn \l_tmpa_int {\clist_count:N #1}
    \int_compare:nNnF {\int_mod:nn \l_tmpa_int {#3}} = 0 {
        \msg_fatal:nn{\ExplFileName}{
            Wrong~count~of~elements~in~a~{#2}~clist,~should~be~a~multiple~of~{#3}!
        }
    }
    \group_end:
}


% ============================================================================ %
% \includepdf wrapper (\MyIncludePdf)
% ============================================================================ %

\keys_define:nn { pdfpwkeys }
{
    linkname        .tl_set:N    = \l_pdfpw_linkname_tl,
        % same as in \includepdf
    pages           .tl_set:N    = \l_pdfpw_pages_tl,
        % supported format: 2-4 (only)
    pageoffset      .int_set:N   = \l_pdfpw_pageoffset_int,
        % e.g. pageoffset=1 if page numbering in the PDF starts from the second page
        % pageoffset={-14} if first page in the PDF has number 15, etc.
        % This is an additional feature implemented here for convenience
        % offset-aware page numbers can be used in addtotoc, addtolist, picturecommands
    addtotoc        .clist_set:N = \l_pdfpw_addtotoc_clist,
        % same as in \includepdf, but pageoffset-aware
    addtolist       .clist_set:N = \l_pdfpw_addtolist_clist,
        % same as in \includepdf, but pageoffset-aware
    picturecommand  .tl_set:N    = \l_pdfpw_picturecommand_tl,
        % same as in \includepdf
    picturecommands .clist_set:N = \l_pdfpw_picturecommands_clist
        % format <page number>, <command>, length: 2*n.
        % Translated to per-page picturecommand* as understood by \includepdf
}

% there is no \int_(g)clear_new:N available... must be declared here:
\int_new:N \l_pdfpw_first_page_int
\int_new:N \l_pdfpw_last_page_int
%
\cs_new_protected:Npn \__pdfpw_process_pages: {
    % ------------------------------------------------------------------------ %
    % this function reads page range and pageoffset and generates a sequence of page numbers
    % e.g. for pages={1-3} and pageoffset=1 given by the user, \l_pdfpw_pages_seq is: 2, 3, 4
    % ------------------------------------------------------------------------ %
    \seq_gclear_new:N \l_pdfpw_page_range_seq
    \seq_gclear_new:N \l_pdfpw_pages_seq
    %
    \seq_set_split:NnV \l_pdfpw_page_range_seq {-} \l_pdfpw_pages_tl
    % split should result in a 2-element sequence:
    \int_compare:nNnF {\seq_count:N \l_pdfpw_page_range_seq} = 2 {
        \msg_fatal:nn{\ExplFileName}{Pages~range~format~not~supported!}
    }
    % generate sequence (pageoffset-aware):
    \int_set:Nn \l_pdfpw_first_page_int {\seq_item:Nn \l_pdfpw_page_range_seq 1}
    \int_set:Nn \l_pdfpw_last_page_int {\seq_item:Nn \l_pdfpw_page_range_seq 2}
    \int_add:Nn \l_pdfpw_first_page_int {\l_pdfpw_pageoffset_int}
    \int_add:Nn \l_pdfpw_last_page_int {\l_pdfpw_pageoffset_int}
    \int_step_inline:nnn \l_pdfpw_first_page_int \l_pdfpw_last_page_int {
        \seq_gput_right:Nn \l_pdfpw_pages_seq {##1}
    }
}


\int_new:N \l_pdfpw_addtotoc_page_number_int  % page number as understood by \includepdf
%
\cs_new_protected:Npn \__pdfpw_process_addtotoc: {
    % ------------------------------------------------------------------------ %
    % see texdoc pdfpages, \includepdf command, experimental options, addtotoc
    % this function repacks clist like this:
    %   addtotoc={
    %       1, section, 1, {Title}, labelt,
    %       1, subsection, 2, {SectionA}, labela,
    %       2, subsection, 2, {SectionB}, labelb,
    %       2, subsection, 2, {SectionC}, labelc,
    %   }
    % to property list, where:
    %     - key is a page number (plus pageoffset if non-zero)
    %     - value is a subset of the list above only for this page
    % e.g. for the list above
    % \l_pdfpw_per_page_addtotoc_prop consists of key-value pairs:
    % >  {1}  =>  {1, section, 1, {Title}, labelt, 1, subsection, 2, {SectionA}, labela}
    % >  {2}  =>  {2, subsection, 2, {SectionB}, labelb, 2, subsection, 2, {SectionC}, labelc}
    % 
    % ASSUMED that pageoffset=0, otherwise this offset is included in page number calculations
    % ------------------------------------------------------------------------ %
    % 
    \prop_gclear_new:N \l_pdfpw_per_page_addtotoc_prop
    % addtotoc={} length should be a multiple of 5:
    \__pdfpw_check_clist_length:Nnn \l_pdfpw_addtotoc_clist {addtotoc} 5
    %
    \tl_gclear_new:N \l_pdfpw_addtotoc_page_number_tl  % page number as input by the user (pageoffset-aware)
    \tl_gclear_new:N \l_pdfpw_addtotoc_section_tl
    \tl_gclear_new:N \l_pdfpw_addtotoc_level_tl
    \tl_gclear_new:N \l_pdfpw_addtotoc_heading_tl
    \tl_gclear_new:N \l_pdfpw_addtotoc_label_tl
    %
    \bool_until_do:nn {\clist_if_empty_p:N \l_pdfpw_addtotoc_clist} {
        \clist_pop:NN \l_pdfpw_addtotoc_clist \l_pdfpw_addtotoc_page_number_tl
        \clist_pop:NN \l_pdfpw_addtotoc_clist \l_pdfpw_addtotoc_section_tl
        \clist_pop:NN \l_pdfpw_addtotoc_clist \l_pdfpw_addtotoc_level_tl
        \clist_pop:NN \l_pdfpw_addtotoc_clist \l_pdfpw_addtotoc_heading_tl
        \clist_pop:NN \l_pdfpw_addtotoc_clist \l_pdfpw_addtotoc_label_tl
        %
        \int_set:Nn
            \l_pdfpw_addtotoc_page_number_int
            {\l_pdfpw_addtotoc_page_number_tl + \l_pdfpw_pageoffset_int}
        %
        \seq_if_in:NVTF \l_pdfpw_pages_seq \l_pdfpw_addtotoc_page_number_int {
            \prop_get:NVNF
                \l_pdfpw_per_page_addtotoc_prop
                \l_pdfpw_addtotoc_page_number_int
                \l_tmpa_seq 
                {\seq_clear:N \l_tmpa_seq}
            \seq_put_right:NV \l_tmpa_seq \l_pdfpw_addtotoc_page_number_int
            \seq_put_right:NV \l_tmpa_seq \l_pdfpw_addtotoc_section_tl
            \seq_put_right:NV \l_tmpa_seq \l_pdfpw_addtotoc_level_tl
            \seq_put_right:NV \l_tmpa_seq \l_pdfpw_addtotoc_heading_tl
            \seq_put_right:NV \l_tmpa_seq \l_pdfpw_addtotoc_label_tl
            \prop_put:NVV
                \l_pdfpw_per_page_addtotoc_prop
                {\l_pdfpw_addtotoc_page_number_int}
                {\l_tmpa_seq}
        } {
            \msg_fatal:nn{\ExplFileName}{
                addtotoc:~page~\l_pdfpw_addtotoc_page_number_tl~~out~of~range
            }
        }
    }
}


\int_new:N \l_pdfpw_addtolist_page_number_int  % page number as understood by \includepdf
%
\cs_new_protected:Npn \__pdfpw_process_addtolist: {
    % ------------------------------------------------------------------------ %
    % see texdoc pdfpages, \includepdf command, experimental options, addtolist
    % this function is similar to \__pdfpw_process_addtotoc, see its description
    % the difference is that it processes `addtolist' option, not `addtotoc'
    % ------------------------------------------------------------------------ %
    %
    \prop_gclear_new:N \l_pdfpw_per_page_addtolist_prop
    % addtolist={} length should be a multiple of 4:
    \__pdfpw_check_clist_length:Nnn \l_pdfpw_addtolist_clist {addtolist} 4
    %
    \tl_gclear_new:N \l_pdfpw_addtolist_page_number_tl  % page number as input by the user (pageoffset-aware)
    \tl_gclear_new:N \l_pdfpw_addtolist_type_tl
    \tl_gclear_new:N \l_pdfpw_addtolist_heading_tl
    \tl_gclear_new:N \l_pdfpw_addtolist_label_tl
    %
    \bool_until_do:nn {\clist_if_empty_p:N \l_pdfpw_addtolist_clist} {
        \clist_pop:NN \l_pdfpw_addtolist_clist \l_pdfpw_addtolist_page_number_tl
        \clist_pop:NN \l_pdfpw_addtolist_clist \l_pdfpw_addtolist_type_tl
        \clist_pop:NN \l_pdfpw_addtolist_clist \l_pdfpw_addtolist_heading_tl
        \clist_pop:NN \l_pdfpw_addtolist_clist \l_pdfpw_addtolist_label_tl
        %
        \int_set:Nn
            \l_pdfpw_addtolist_page_number_int
            {\l_pdfpw_addtolist_page_number_tl + \l_pdfpw_pageoffset_int}
        %
        \seq_if_in:NVTF \l_pdfpw_pages_seq \l_pdfpw_addtolist_page_number_int {
            \prop_get:NVNF
                \l_pdfpw_per_page_addtolist_prop
                \l_pdfpw_addtolist_page_number_int
                \l_tmpa_seq 
                {\seq_clear:N \l_tmpa_seq}
            \seq_put_right:NV \l_tmpa_seq \l_pdfpw_addtolist_page_number_int
            \seq_put_right:NV \l_tmpa_seq \l_pdfpw_addtolist_type_tl
            \seq_put_right:NV \l_tmpa_seq \l_pdfpw_addtolist_heading_tl
            \seq_put_right:NV \l_tmpa_seq \l_pdfpw_addtolist_label_tl
            \prop_put:NVV
                \l_pdfpw_per_page_addtolist_prop
                {\l_pdfpw_addtolist_page_number_int}
                {\l_tmpa_seq}
        } {
            \msg_fatal:nn{\ExplFileName}{
                addtolist:~page~\l_pdfpw_addtolist_page_number_tl~~out~of~range
            }
        }
    }
}


\int_new:N \l_pdfpw_picturecommands_page_number_int  % page number as understood by \includepdf
%
\cs_new_protected:Npn \__pdfpw_process_picturecommands: {
    % ------------------------------------------------------------------------ %
    % see texdoc pdfpages, \includepdf command, layout options, picturecommand*
    % original option allows to put picturecommand* only on the first page
    % I want to be able to put DIFFERENT commands for different pages
    % (e.g. \index{some term})
    %
    % this function reads clist like this:
    % picturecommands={
    %     1, command1,
    %     2, command2
    % }
    % to property list, where:
    %     - key is a page number (plus pageoffset if non-zero)
    %     - value is a command for this page (only one per page is possible)
    % e.g. for the list above
    % \l_pdfpw_per_page_picturecommands_prop consists of key-value pairs:
    % >  {1}  =>  {command1}
    % >  {2}  =>  {command2}
    % 
    % ASSUMED that pageoffset=0, otherwise this offset is included in page number calculations
    % ------------------------------------------------------------------------ %
    %
    \prop_gclear_new:N \l_pdfpw_per_page_picturecommands_prop
    % picturecommands={} length should be a multiple of 2:
    \__pdfpw_check_clist_length:Nnn \l_pdfpw_picturecommands_clist {picturecommands} 2
    %
    \tl_gclear_new:N \l_pdfpw_picturecommands_page_number_tl  % page number as input by the user (pageoffset-aware)
    % will be passed to picturecommand* option of \includepdf:
    \tl_gclear_new:N \l_pdfpw_picturecommands_command_tl
    %
    \bool_until_do:nn {\clist_if_empty_p:N \l_pdfpw_picturecommands_clist} {
        \clist_pop:NN \l_pdfpw_picturecommands_clist \l_pdfpw_picturecommands_page_number_tl
        \clist_pop:NN \l_pdfpw_picturecommands_clist \l_pdfpw_picturecommands_command_tl
        %
        \int_set:Nn
            \l_pdfpw_picturecommands_page_number_int
            {\l_pdfpw_picturecommands_page_number_tl + \l_pdfpw_pageoffset_int}
        %
        \seq_if_in:NVTF \l_pdfpw_pages_seq \l_pdfpw_picturecommands_page_number_int {
            \prop_get:NVNT
                \l_pdfpw_per_page_picturecommands_prop
                \l_pdfpw_picturecommands_page_number_int
                \l_tmpa_tl
                {\msg_fatal:nn{\ExplFileName}{
                    picturecommands:~picturecommand*~for~page~ \l_pdfpw_picturecommands_page_number_tl~~has~already~been~defined
                    }} 
            \prop_put:NVV
                \l_pdfpw_per_page_picturecommands_prop
                {\l_pdfpw_picturecommands_page_number_int}
                {\l_pdfpw_picturecommands_command_tl}
        } {
            \msg_fatal:nn{\ExplFileName}{
                picturecommands:~page~\l_pdfpw_picturecommands_page_number_tl~~out~of~range
            }
        }
    }
}


\cs_new_protected:Npn \__pdfpw_includepdfpage:nn #1#2 {
    % ------------------------------------------------------------------------ %
    % this function reads previously created data structures to create first
    % (optional) argument to \includepdf (list of key-value pairs)
    % single \includepdf "call" includes only one PDF page
    % thanks to this it is possible to use picturecommand* to put different stuff
    % on each page, since each page is "the first" for \includepdf
    %
    % args:
    %     #1 - page number
    %     #2 - PDF file name
    % ------------------------------------------------------------------------ %
    %
    % ------------------------------------------------------------------------ %
    % create property list with key-value pairs for \includepdf
    % (one page at a time)
    % ------------------------------------------------------------------------ %
    % TODO: differentiate between the situation when the user gives empty
    % linkname={}, picturecommand={}, etc. and the situation when the user
    % does not use these options at all
    \prop_gclear_new:N \l_pdfpw_includepdf_prop
    % pages
    \prop_put:Nnn \l_pdfpw_includepdf_prop {pages} {#1}
    % linkname
    \prop_put:NnV \l_pdfpw_includepdf_prop {linkname} {\l_pdfpw_linkname_tl}
    % addtotoc
    \prop_get:NnNT \l_pdfpw_per_page_addtotoc_prop {#1} \l_tmpa_seq {
        \clist_set_from_seq:NN \l_tmpa_clist \l_tmpa_seq
        \prop_put:NnV \l_pdfpw_includepdf_prop {addtotoc} \l_tmpa_clist
    }
    % addtolist
    \prop_get:NnNT \l_pdfpw_per_page_addtolist_prop {#1} \l_tmpa_seq {
        \clist_set_from_seq:NN \l_tmpa_clist \l_tmpa_seq
        \prop_put:NnV \l_pdfpw_includepdf_prop {addtolist} \l_tmpa_clist
    }
    % picturecommand
    \prop_put:NnV \l_pdfpw_includepdf_prop {picturecommand} {\l_pdfpw_picturecommand_tl}
    % picturecommands -> picturecommand*
    \prop_get:NnNT \l_pdfpw_per_page_picturecommands_prop {#1} \l_tmpa_tl {
        \prop_put:NnV \l_pdfpw_includepdf_prop {picturecommand*} \l_tmpa_tl
    }
    % ------------------------------------------------------------------------ %
    % convert property list to a token list
    % ------------------------------------------------------------------------ %
    \tl_gclear_new:N \l_pdfpw_includepdf_keyval_tl
    \prop_map_inline:Nn \l_pdfpw_includepdf_prop {
        \prop_get:NnN \l_pdfpw_includepdf_prop {##1} \l_tmpa_tl
        \tl_if_empty:NF
            \l_pdfpw_includepdf_keyval_tl
            {\tl_put_right:Nn \l_pdfpw_includepdf_keyval_tl {,}}
        \tl_put_right:Nn \l_pdfpw_includepdf_keyval_tl {##1}
        \tl_put_right:Nn \l_pdfpw_includepdf_keyval_tl {=}
        \__pdfpw_tl_put_right_braced:NV \l_pdfpw_includepdf_keyval_tl \l_tmpa_tl
    }
    \tl_set_rescan:Nno  % ! proper catcodes must be set !
        \l_pdfpw_includepdf_keyval_tl
        { }
        { \l_pdfpw_includepdf_keyval_tl }  
    % \tl_analysis_show:N \l_pdfpw_includepdf_keyval_tl  % for debugging catcodes
    % \tl_show:N \l_pdfpw_includepdf_keyval_tl
    %
    % ------------------------------------------------------------------------ %
    % call \includepdf from pdfpages package:
    % ------------------------------------------------------------------------ %
    \expandafter\includepdf\expandafter[\l_pdfpw_includepdf_keyval_tl]{#2}
    % ------------------------------------------------------------------------ %
}


\cs_new_protected:Npn \__pdfpw_includepdfpages:n #1 {
    % ------------------------------------------------------------------------ %
    % this function loops through all page numbers in a page sequence
    % and calls the function that uses \includepdf to do the magic
    % ------------------------------------------------------------------------ %
    \seq_map_inline:Nn \l_pdfpw_pages_seq {  % for each page
        \__pdfpw_includepdfpage:nn {##1} {#1}
    }
}


\NewDocumentCommand{\MyIncludePdf}{ O{} m }{
    % ------------------------------------------------------------------------ %
    % this is a "public" command to be used in the document
    % has similar interface to \includepdf
    % see pdfpwkeys definition to see a description of the options
    % ------------------------------------------------------------------------ %
    \group_begin:
    % read key-value pairs from the first optional argument
    \keys_set:nn { pdfpwkeys } { #1 }
    % generate sequence of page numbers
    \__pdfpw_process_pages:
    % repack data from addtotoc, addtolist and picturecommands options
    % to a "per-page"-friendly format
    \__pdfpw_process_addtotoc:
    \__pdfpw_process_addtolist:
    \__pdfpw_process_picturecommands:
    % call \includepdf for each page separately:
    \__pdfpw_includepdfpages:n {#2}
    \group_end:
}

main.tex:

Code: Select all

\documentclass{book}
\usepackage{pdfpages}
\includepdfset{pages=-,frame,scale=0.65,pagecommand={},link}
\usepackage{imakeidx}
\makeindex
\usepackage{pdfpw}  % this one wraps \includepdf into \MyIncludePdf
\usepackage{hyperref}


\begin{document}

\mainmatter

\tableofcontents

\chapter{Papers}

\MyIncludePdf[
    linkname=pdf:Pdf1,
    pages={1-3},
    pageoffset={0},
    addtotoc={
        1, section, 1,
        {Observation of Gravitational Waves from a Binary Black Hole Merger}, % Attempt to \textbf{style} this text will result in an error, also in regular \\includepdf (if hyperref is active)
        sec:Pdf1Gravity,
        1, subsection, 2,
        {Introduction, with, commas},
        sec:Pdf1Sec1Intro,
        2, subsection, 2,
        {Observation},
        sec:Pdf1SecObservation
    },
    addtolist={
        2, figure,
        {Fig~1. The gravitational-wave event \emph{GW150914}},
        fig:Pdf1Fig1GravitationalWaveEvent,
        3, table,
        {Table~1. Some nonexistent table},
        fig:Pdf1Tab1Dummy
    },
    picturecommand={
        \put(50,50){This is a standard picturecommand (put on all pdf pages), commas, are, fine}
    },
    picturecommands={
        1, {\put(50,70){picturecommand* for PDF page 1}\index{\textsc{Term A}}},
        2, {\put(50,70){picturecommand* for PDF page 2}\index{Term B}},
        3, {\put(50,70){picturecommand* for PDF page 3}\index{Term C}}
    }
    ]{1602.03837.pdf}


\MyIncludePdf[
    linkname=pdf:Pdf2,
    pages={1-4},
    pageoffset={0},
    addtotoc={
        1, section, 1,
        {A radiocarbon spike at 14 300 cal yr BP in subfossil trees provides the impulse response function of the global carbon cycle during the Late Glacial},
        sec:Pdf2Title,
        2, subsection, 2,
        {Introduction},
        sec:Pdf2Sec1Intro,
        3, subsection, 2,
        {Sites and materials},
        sec:Pdf2SecSitesMaterials
    },
    addtolist={
        4, figure,
        {Fig~1. Map of \texttt{Southeastern France}},
        fig:Pdf2Fig1,
        3, table,
        {Table~1. Another \textbf{nonexistent} table},
        fig:Pdf2Tab1Dummy
    },
    picturecommand={
        \put(150,750){Another \textit{picturecommand} (all pages)}
    },
    picturecommands={
        1, {\put(150,730){picturecommand* for \textsc{Pdf} page 1}\index{Term D}},
        2, {\put(150,730){picturecommand* for \textsc{Pdf} page 2}\index{Term E}},
        3, {\put(150,730){picturecommand* for \textsc{Pdf} page 3}\index{Term F}},
        4, {\put(150,730){picturecommand* for \textsc{Pdf} page 4}\index{Term G}}
    }
    ]{bard-et-al-2023-a-radiocarbon-spike-at-14-300-cal-yr-bp-in-subfossil-trees-provides-the-impulse-response-function-of.pdf}
    

\MyIncludePdf[
    pages={6-6}  % {6} won't work currently
    ]{bard-et-al-2023-a-radiocarbon-spike-at-14-300-cal-yr-bp-in-subfossil-trees-provides-the-impulse-response-function-of.pdf}


\listoffigures

\listoftables

\printindex

\end{document}
Post Reply