pandoc is a tool for converting documents from one format into another.
You could use it to read an HTML document and convert it to markdown.
I prefer to use it to produce nice PDFs but using markdown instead of LaTeX.
To do this, you probably want both pandoc and texlive-full.
Unfortunately, this can be a rather large thing to install:
$ sudo apt-get install pandoc texlive-full
Reading package lists... Done
Building dependency tree
Reading state information... Done
…
The following NEW packages will be installed
aglfn chktex cm-super cm-super-minimal context context-modules
dvidvi dvipng feynmf fonts-cabin fonts-comfortaa
fonts-crosextra-caladea fonts-crosextra-carlito fonts-ebgaramond
fonts-ebgaramond-extra fonts-font-awesome fonts-freefont-otf
fonts-gfs-artemisia fonts-gfs-baskerville fonts-gfs-bodoni-classic
fonts-gfs-complutum fonts-gfs-didot fonts-gfs-didot-classic
fonts-gfs-gazis fonts-gfs-neohellenic fonts-gfs-olga
fonts-gfs-porson fonts-gfs-solomos fonts-gfs-theokritos
fonts-hosny-amiri fonts-inconsolata fonts-ipaexfont-gothic
fonts-ipaexfont-mincho fonts-junicode fonts-lato
fonts-linuxlibertine fonts-lmodern fonts-lobster fonts-lobstertwo
fonts-oflb-asana-math fonts-roboto fonts-sil-gentium
fonts-sil-gentium-basic fonts-sil-gentiumplus fonts-stix
fonts-texgyre fragmaster lacheck latex-cjk-all latex-cjk-chinese
latex-cjk-chinese-arphic-bkai00mp latex-cjk-chinese-arphic-bsmi00lp
latex-cjk-chinese-arphic-gbsn00lp latex-cjk-chinese-arphic-gkai00mp
latex-cjk-common latex-cjk-japanese latex-cjk-japanese-wadalab
latex-cjk-korean latex-cjk-thai latexdiff latexmk lcdf-typetools
libfile-homedir-perl libfile-which-perl libintl-perl libplot2c2
libpoppler-qt4-4 libpotrace0 libpstoedit0v5 libptexenc1 libsynctex1
libtexlua52 libtexluajit2 libtext-unidecode-perl libxml-libxml-perl
libxml-namespacesupport-perl libxml-sax-base-perl
libxml-sax-expat-perl libxml-sax-perl libzzip-0-13 lmodern m-tx
musixtex pandoc pandoc-data pfb2t1c2pfb pmx prerex
preview-latex-style prosper ps2eps pstoedit psutils purifyeps
tex-common tex-gyre tex4ht tex4ht-common texinfo texlive-base
texlive-bibtex-extra texlive-binaries texlive-extra-utils
texlive-font-utils texlive-fonts-extra texlive-fonts-extra-doc
texlive-fonts-recommended texlive-fonts-recommended-doc
texlive-formats-extra texlive-full texlive-games
texlive-generic-extra texlive-generic-recommended
texlive-humanities texlive-humanities-doc texlive-lang-african
texlive-lang-arabic texlive-lang-chinese texlive-lang-cjk
texlive-lang-cyrillic texlive-lang-czechslovak texlive-lang-english
texlive-lang-european texlive-lang-french texlive-lang-german
texlive-lang-greek texlive-lang-indic texlive-lang-italian
texlive-lang-japanese texlive-lang-korean texlive-lang-other
texlive-lang-polish texlive-lang-portuguese texlive-lang-spanish
texlive-latex-base texlive-latex-base-doc texlive-latex-extra
texlive-latex-extra-doc texlive-latex-recommended
texlive-latex-recommended-doc texlive-luatex texlive-math-extra
texlive-metapost texlive-metapost-doc texlive-music texlive-omega
texlive-pictures texlive-pictures-doc texlive-plain-extra
texlive-pstricks texlive-pstricks-doc texlive-publishers
texlive-publishers-doc texlive-science texlive-science-doc
texlive-xetex tipa ttf-adf-accanthis ttf-adf-gillius
ttf-adf-universalis vprerex
0 to upgrade, 161 to newly install, 0 to remove and 0 not to upgrade.
Need to get 1,782 MB of archives.
After this operation, 3,466 MB of additional disk space will be used.
You may find it convenient to define a simple make rule for turning pandoc markdown documents into PDFs, as follows:
$ cat >Makefile <<'EOF'
%.pdf : %.mdwn
pandoc -o $@ $<
EOF
Now you may write markdown files,
and convert them into PDFs,
simply by running make foo.pdf
to turn foo.mdwn
into foo.pdf
.
If you use a PDF viewer such as evince, if you have the document open, and then generate a new version, your viewer will refresh itself to view the new version.
You can make your PDF viewer show live updates to your document by combining this make rule with an inotify watch command:
$ while true; do inotifywait -e close_write,move_self foo.mdwn; make foo.pdf; done
You can write in-line LaTeX to include content that is not expressible in markdown.
markdown also lacks a way of providing metadata, but pandoc has an extension called yaml_metadata_block, which parses the header of a document as a YAML document.
---
title: Foo
author: Joe Bloggs
header-includes:
- \usepackage{bytefield}
geometry: margin=3cm
---
# Title
This is an example bytefield figure:
\begin{bytefield}{16}
\wordbox{1}{A 16-bit field} \\
\bitbox{8}{8 bits} & \bitbox{8}{8 more bits} \\
\wordbox{2}{A 32-bit field. Note that text wraps within the box.}
\end{bytefield}
Without the header, you would otherwise need to write a separate file
containing the \usepackage{bytefield}
directive,
and invoke pandoc with --template=use-bytefield.latex
.