Output encoding

Encoding support is also present for the output text from Beautiful Soup. There are certain output methods in Beautiful Soup, for example, prettify(), which will give the output only in the UTF-8 encoding. Even though the encoding was something different like ISO 8859-2, the output will be in UTF-8. For example, the following HTML content is an example of ISO8859-2 encoding:

html_markup = """
<html>
  <meta http-equiv="Content-Type" content="text/html;charset=ISO8859-2"/>
  <p>cédille (from French), is a hook or tail ( ž )  added under certain letters as a diacritical mark to modify their pronunciation
  </p>"""
soup = BeautifulSoup(html_markup,"lxml")

The soup.original_encoding will give us the encoding as ISO8859-2, which is true for ...

Get Getting Started with Beautiful Soup now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.