Convert ODT to PDF with Pandoc and LaTeX

Convert ODT to PDF with Pandoc and LaTeX
  • Receive an ODT file (OpenDocument Text Document).
  • Everyone: opens the file with either LibreOffice or even Microsoft Office nowadays, apparently.
  • Me: uses Pandoc and LaTeX to convert the file to PDF and read it in Evince because I don’t have LibreOffice installed and I’m too lazy to upload the document to Google Docs.

I needed to review an addendum to a rental contract. (I moved! I’ll write about that later.) The addendum was sent to me in ODT format. At the time, my desktop pc was still packed in a box. On my laptop (a 2011 MacBook Air with Ubuntu 20.04) I only have the most essential software installed, which for me doesn’t include an office suite. I could install LibreOffice, but why make it easy if I can also do it the hard way? 😀

I do have Evince installed, which is a lightweight PDF viewer. To convert ODT to PDF I’m using Pandoc, which is a Swiss army knife for converting document formats. For PDF it needs the help of LaTeX, a document preparation system for typesetting.

First I installed the required software:

$ sudo apt install pandoc texlive texlive-latex-extra
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  libapache-pom-java libcommons-logging-java libcommons-parent-java libfontbox-java libpdfbox-java preview-latex-style texlive-base texlive-binaries
  texlive-fonts-recommended texlive-latex-base texlive-latex-recommended texlive-pictures texlive-plain-generic tipa
Suggested packages:
  libavalon-framework-java libcommons-logging-java-doc libexcalibur-logkit-java liblog4j1.2-java texlive-xetex texlive-luatex pandoc-citeproc
  context wkhtmltopdf librsvg2-bin groff ghc php python r-base-core libjs-mathjax node-katex perl-tk xzdec texlive-fonts-recommended-doc
  texlive-latex-base-doc python3-pygments icc-profiles libfile-which-perl libspreadsheet-parseexcel-perl texlive-latex-extra-doc
  texlive-latex-recommended-doc texlive-pstricks dot2tex prerex ruby-tcltk | libtcltk-ruby texlive-pictures-doc vprerex
The following NEW packages will be installed:
  libapache-pom-java libcommons-logging-java libcommons-parent-java libfontbox-java libpdfbox-java pandoc preview-latex-style texlive texlive-base
  texlive-binaries texlive-fonts-recommended texlive-latex-base texlive-latex-extra texlive-latex-recommended texlive-pictures texlive-plain-generic
  tipa
0 upgraded, 17 newly installed, 0 to remove and 1 not upgraded.
Need to get 116 MB of archives.
After this operation, 448 MB of additional disk space will be used.
Do you want to continue? [Y/n]

Just to compare, installing LibreOffice Writer would actually use less disk space. Pandoc is a lot faster though.

$ sudo apt install libreoffice-writer
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  libabw-0.1-1 libboost-date-time1.71.0 libboost-filesystem1.71.0 libboost-iostreams1.71.0 libboost-locale1.71.0 libclucene-contribs1v5
  libclucene-core1v5 libcmis-0.5-5v5 libe-book-0.1-1 libeot0 libepubgen-0.1-1 libetonyek-0.1-1 libexttextcat-2.0-0 libexttextcat-data libgpgmepp6
  libjuh-java libjurt-java liblangtag-common liblangtag1 libmhash2 libmwaw-0.3-3 libmythes-1.2-0 libneon27-gnutls libodfgen-0.1-1 liborcus-0.15-0
  libraptor2-0 librasqal3 librdf0 libreoffice-base-core libreoffice-common libreoffice-core libreoffice-math libreoffice-style-colibre
  libreoffice-style-tango librevenge-0.0-0 libridl-java libuno-cppu3 libuno-cppuhelpergcc3-3 libuno-purpenvhelpergcc3-3 libuno-sal3
  libuno-salhelpergcc3-3 libunoloader-java libwpd-0.10-10 libwpg-0.3-3 libwps-0.4-4 libxmlsec1 libxmlsec1-nss libyajl2 python3-uno uno-libs-private
  ure
Suggested packages:
  raptor2-utils rasqal-utils librdf-storage-postgresql librdf-storage-mysql librdf-storage-sqlite librdf-storage-virtuoso redland-utils
  libreoffice-base gstreamer1.0-plugins-bad tango-icon-theme fonts-crosextra-caladea fonts-crosextra-carlito libreoffice-java-common
The following NEW packages will be installed:
  libabw-0.1-1 libboost-date-time1.71.0 libboost-filesystem1.71.0 libboost-iostreams1.71.0 libboost-locale1.71.0 libclucene-contribs1v5
  libclucene-core1v5 libcmis-0.5-5v5 libe-book-0.1-1 libeot0 libepubgen-0.1-1 libetonyek-0.1-1 libexttextcat-2.0-0 libexttextcat-data libgpgmepp6
  libjuh-java libjurt-java liblangtag-common liblangtag1 libmhash2 libmwaw-0.3-3 libmythes-1.2-0 libneon27-gnutls libodfgen-0.1-1 liborcus-0.15-0
  libraptor2-0 librasqal3 librdf0 libreoffice-base-core libreoffice-common libreoffice-core libreoffice-math libreoffice-style-colibre
  libreoffice-style-tango libreoffice-writer librevenge-0.0-0 libridl-java libuno-cppu3 libuno-cppuhelpergcc3-3 libuno-purpenvhelpergcc3-3
  libuno-sal3 libuno-salhelpergcc3-3 libunoloader-java libwpd-0.10-10 libwpg-0.3-3 libwps-0.4-4 libxmlsec1 libxmlsec1-nss libyajl2 python3-uno
  uno-libs-private ure
0 upgraded, 52 newly installed, 0 to remove and 1 not upgraded.
Need to get 78,5 MB of archives.
After this operation, 283 MB of additional disk space will be used.
Do you want to continue? [Y/n] n
Abort.

Next, converting the file. It’s possible to tell Pandoc which file formats to use with the -f (from) and -t (to) switches, but it can usually guess correctly based on the file extensions.

$ time pandoc 2022-06-house-contract-adendum.odt -o 2022-06-house-contract-adendum.pdf

real	0m0,519s
user	0m0,475s
sys	0m0,059s

It took only half a second to convert the file. Opening LibreOffice takes a bit more time on this old laptop.

You can see the PDF document properties with pdfinfo:

$ pdfinfo 2022-06-house-contract-adendum.pdf 
Title:          
Subject:        
Keywords:       
Author:         
Creator:        LaTeX with hyperref
Producer:       pdfTeX-1.40.20
CreationDate:   Sat Jun 11 23:32:30 2022 CEST
ModDate:        Sat Jun 11 23:32:30 2022 CEST
Tagged:         no
UserProperties: no
Suspects:       no
Form:           none
JavaScript:     no
Pages:          2
Encrypted:      no
Page size:      612 x 792 pts (letter)
Page rot:       0
File size:      64904 bytes
Optimized:      no
PDF version:    1.5

I don’t want it in letter format, I want A4:

$ time pandoc -V papersize:a4 -o 2022-06-house-contract-adendum.pdf 2022-06-house-contract-adendum.odt

real	0m0,520s
user	0m0,469s
sys	0m0,060s
$ pdfinfo 2022-06-house-contract-adendum.pdf 
Title:          
Subject:        
Keywords:       
Author:         
Creator:        LaTeX with hyperref
Producer:       pdfTeX-1.40.20
CreationDate:   Sat Jun 11 23:40:16 2022 CEST
ModDate:        Sat Jun 11 23:40:16 2022 CEST
Tagged:         no
UserProperties: no
Suspects:       no
Form:           none
JavaScript:     no
Pages:          2
Encrypted:      no
Page size:      595.276 x 841.89 pts (A4)
Page rot:       0
File size:      64935 bytes
Optimized:      no
PDF version:    1.5

Then I could open the file with evince 2022-06-house-contract-adendum.pdf.

And yes, I know that addendum is with double d. 🙂

Leave a Reply