Doc bug: include_meta_content_type doesn't

Bug #612843 reported by Jean Jordaan
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Confirmed
Low
Unassigned

Bug Description

The docs for lxml.html.tostring state:

  if include_meta_content_type is true this will create a ``<meta http-equiv="Content-Type" ...>`` tag in the head

It does not:

In [81]: new_doc = E.HTML(E.HEAD('title'), E.BODY(u'content ë ç ¥'))

In [85]: tostring(new_doc, encoding='utf-8', include_meta_content_type=True)
Out[85]: '<html><head>title</head><body>content \xc3\x83\xc2\xab \xc3\x83\xc2\xa7 \xc3\x82\xc2\xa5</body></html>'

To get the meta tag, I have to create it explicity:

In [87]: new_doc = E.HTML(E.HEAD(E.META({'http-equiv':"Content-Type", 'content':"text/html; charset=utf-8"}),'title'), E.BODY(u'content ë ç ¥'))

Now tostring works the same, with or without include_meta_content_type:

In [90]: tostring(new_doc, include_meta_content_type=True, encoding='utf-8')
Out[90]: '<html><head><meta content="text/html; charset=utf-8" http-equiv="Content-Type">title</head><body>content \xc3\x83\xc2\xab \xc3\x83\xc2\xa7 \xc3\x82\xc2\xa5</body></html>'

In [91]: tostring(new_doc, encoding='utf-8')
Out[91]: '<html><head><meta content="text/html; charset=utf-8" http-equiv="Content-Type">title</head><body>content \xc3\x83\xc2\xab \xc3\x83\xc2\xa7 \xc3\x82\xc2\xa5</body></html>'

Is this the proper way to create HTML with encoding specified using lxml?

Python : (2, 6, 5, 'final', 0)
lxml.etree : (2, 2, 4, 0)
libxml used : (2, 7, 6)
libxml compiled : (2, 7, 6)
libxslt used : (1, 1, 26)
libxslt compiled : (1, 1, 26)

Revision history for this message
scoder (scoder) wrote :

I agree that this is a bit quirky. Basically, it simply runs some string post processing after serialisation and tries to strip the tag that way. The original intention was to deal with the <meta> tag that libxml2 explicitly generates in some cases. Apparently not in this case.

Looks like this feature needs a proper redesign at some point...

Changed in lxml:
importance: Undecided → Low
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.