The etgen.html module defines an ElementTree Builder for generating HTML documents.


The global ElementTree Builder object.


>>> from etgen.html import E, tostring, tostring_pretty, CLASS
>>> import lxml.usedoctest
>>> html = E.html(
...            E.head( E.title("Hello World") ),
...            E.body(
...              E.h1("Hello World !"),
...              CLASS("main")
...            )
...        )
>>> print (tostring_pretty(html))
    <title>Hello World</title>
  <body class="main">
    <h1>Hello World !</h1>
>>> kw = dict(title='Ein süßes Beispiel')
>>> kw.update(href="foo/bar.html")
>>> btn = E.button(type='button', **CLASS('x-btn-text x-tbar-upload'))
>>> html = E.a(btn, **kw)
>>> print (tostring_pretty(html))
<a href="foo/bar.html" title="Ein süßes Beispiel">
  <button class="x-btn-text x-tbar-upload" type="button"/>

You can also do the opposite, i.e. parse HTML:

>>> from lxml import etree
>>> E_raw = etree.fromstring
>>> html = E_raw('''<a href="foo/bar.html"
... title="Ein s&#252;&#223;es Beispiel">
... <button class="x-btn-text x-tbar-upload" type="button"/>
... </a>''')
>>> print(tostring_pretty(html))
<a href="foo/bar.html" title="Ein süßes Beispiel">
  <button class="x-btn-text x-tbar-upload" type="button"/>
>>> print(tostring(E_raw(
...     '<ul type="disc"><li>First</li><li>Second</li></ul>')))
<ul type="disc"><li>First</li><li>Second</li></ul>
>>> html = E.div(E.p("First"), E.p("Second"))
>>> print(tostring_pretty(html))
>>> html.attrib['class'] = 'htmlText'
>>> print(tostring_pretty(html))
<div class="htmlText">

Avoid self-closing tags

lxml generates self-closing tags for elements without children:

>>> print(tostring(E.div()))

Some environments refuse empty <div> elements and interpret a <div/> as <div> (don’t ask me why). You can avoid the self-closing tag by setting the text attribute to an empty string:

>>> html = E.div()
>>> html.text = ""
>>> print(tostring(html))

Note that you must set text explicitly. Simply specifying it when instantiating the element is not enough:

>>> print(tostring(E.div("")))
>>> print(tostring(E.div(" ")))

The real solution would be to use the “html” method when writing the tree to html:

>>> print(tostring(E.div(), method="html"))

TODO: This approach has been active as default value (see disabled line in code of tostring()) and I don’t remember why we disabled it. I suggest to re-enable it and test thoroughly whether this causes regressions (and if yes, why it causes them).

Converting text lines to a paragraph

The lines2p() function convert list of text lines into a paragraph (<p>) with one <br> between each line. If optional min_height is given, add empty lines if necessary.


>>> from etgen.html import lines2p
>>> print(tostring(lines2p(['first', 'second'])))
>>> print(tostring(lines2p(['first', 'second'], min_height=5)))

If min_height is specified, and lines contains more items, then we don’t truncate:

>>> print(tostring(lines2p(['a', 'b', 'c', 'd', 'e'], min_height=4)))

This also works:

>>> print(tostring(lines2p([], min_height=5)))