Provide type annotations

Bug #1843791 reported by Daniel Hahler
34
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Beautiful Soup
In Progress
Undecided
Unassigned

Bug Description

It would be useful to have type annotations for bs4, to be used with mypy etc.

I've quickly generated them using "stubgen" [1] provided by mypy, and started adding some manually then - but nothing to publish really yet.

I've wondered if there are plans for this already, and thought it would be good to have an issue to discuss this / have a place for reference.

I've used "2to3" on the source before - not clear how this should be handled then when done in the repo itself.

1: https://mypy.readthedocs.io/en/latest/stubgen.html

Revision history for this message
Leonard Richardson (leonardr) wrote :

This is an interesting idea and I would like to get here eventually.

The sticking point, as you've found out, is that the canonical version of the Beautiful Soup code uses Python 2, and it's automatically converted to Python 3. I don't see a way to add these annotations without permanently switching to Python 3.

Because Beautiful Soup is frequently used in duct-tape environments I'm going to keep Python 2 support past the official end-of-life date, but eventually I will drop it, and we can pick up this issue then.

Changed in beautifulsoup:
status: New → Triaged
Revision history for this message
Daniel Hahler (blueyed) wrote :

So you plan to add them to the code directly already? (which is good!)

For Python 2 type hints could be used via comments, and hopefully get converted for the Python 3 version then also (still as comments then though).

btw: maybe it would be good to switch to Python 3 by default and auto-generate the Python 2 code then instead? But likely not worth the effort.

Revision history for this message
Alexander Regueiro (alexreg) wrote :

Yes please! It's now 2021 and type annotations are becoming popular in Python code, especially with the likes of mypy around.

Changed in beautifulsoup:
status: Triaged → In Progress
Revision history for this message
Florian Schulze (florian-schulze) wrote :

With Python 3.8 I currently get an error on the 4.13 branch:
```
../../beautifulsoup/bs4/__init__.py:141: in BeautifulSoup
    element_classes:Dict[type[PageElement], type[Any]] #: :meta private:
E TypeError: 'type' object is not subscriptable
```

Revision history for this message
Florian Schulze (florian-schulze) wrote :

``from __future__ import annotations`` fixes it (https://peps.python.org/pep-0563/), but then using ``|`` for types isn't supported before Python 3.10 https://peps.python.org/pep-0604/

```diff
diff --git a/bs4/__init__.py b/bs4/__init__.py
index 46c770f..b2c889a 100644
--- a/bs4/__init__.py
+++ b/bs4/__init__.py
@@ -13,6 +13,7 @@ and/or html5lib is installed, but they are not required.
 For more than you ever wanted to know about Beautiful Soup, see the
 documentation: http://www.crummy.com/software/BeautifulSoup/bs4/doc/
 """
+from __future__ import annotations

 __author__ = "Leonard Richardson (<email address hidden>)"
 __version__ = "4.12.2"
@@ -23,6 +24,7 @@ __license__ = "MIT"
 __all__ = ['BeautifulSoup']

 from collections import Counter
+from typing import Union
 import os
 import re
 import sys
@@ -376,7 +378,7 @@ class BeautifulSoup(Tag):

         # At this point we know markup is a string or bytestring. If
         # it was a file-type object, we've read from it.
- markup = cast(str|bytes, markup)
+ markup = cast(Union[str, bytes], markup)

         rejections = []
         success = False
```

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.