pytz includes a hard coded list of time zone names

Bug #207604 reported by James Henstridge
46
This bug affects 5 people
Affects Status Importance Assigned to Milestone
pytz
New
Unknown
python-tz (Ubuntu)
Fix Released
Medium
Unassigned
Jammy
Fix Released
Undecided
Unassigned
Kinetic
Fix Released
Undecided
Unassigned

Bug Description

[ Impact ]

The Debian/Ubuntu packages of pytz include a patch to not install pytz's zoneinfo database and instead load data from the system zoneinfo database. This keeps pytz up to date when system timezone updates are installed which is generally a good idea.

The pytz/__init__.py includes common_timezones and all_timezones lists, which enumerate the time zones in the database pytz was released with. If pytz is instead using the system database, these lists may not be correct (e.g. missing newly added time zones).

all_timezones and common_timezones should be calculated at run time. all_timezones will need to use os.walk to discover what timezones exist. common_timezones should be loaded from zone.tab, plus a hard coded list of extras such as UTC and US/Eastern.

Everytime a new timezone is added to tzdata in an update, this timezone cannot be used in pytz with the hardcoded list of timezones. For jammy and kinetic this applies to America/Ciudad_Juarez (when comparing pytz to tzdata 2023c). For bionic the list of missing timezones is even bigger:

* America/Ciudad_Juarez
* America/Nuuk
* Asia/Qostanay
* Europe/Kyiv
* Pacific/Kanton

[ Test Plan ]

Two autopkgtest test cases were added:
1. Run the upstream unittests
2. Regression tests with all tests that I could come up with.

There are no manual test, because all those test should also be run when there is a tzdata update to prevent regressions.

To manually test, a newly added timezone to tzdata should be able to be specified:

```
#!/usr/bin/python3
import pytz
pytz.timezone("America/Ciudad_Juarez")
```

This should not raise UnknownTimeZoneError.

[ Where problems could occur ]

Dynamically determine the timezones relies on a correct environment and can fail more easily if the environment is broken. python-tz is used in several places (maybe in the installer). The patch might have a performance impact, because the list of timezones is determined completely (but only once) instead of constructing it lazy.

[ Other Info ]

Upstream recommend our approach: https://github.com/stub42/pytz/issues/91#issuecomment-1356628324

Revision history for this message
Stuart Bishop (stub) wrote :

common_timezones uses data not found in the binary database, so we may not be able to fix this in pytz. Worst case we need to provide a mechanism for distributions that package pytz in this way to keep this list up to date or dynamically generate it from distro specific sources.

Changed in pytz:
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
James Henstridge (jamesh) wrote :

I suspected that we wouldn't be able to generate common_timezones (unless they hack tzcode to symlink the obsolete time zones instead of hard linking them). I wonder if the tzdata maintainers would be open to installing a file containing the list of time zones that says which ones are not obsolete?

Revision history for this message
Stuart Bishop (stub) wrote :

Petr Machata suggests loading common_timezones from zone.tab at runtime.

This seems like a great starting point for the list. We might want to manually add a few well known names that people expect, such as US/Eastern, but we do that already when generating the static list.

Revision history for this message
James Henstridge (jamesh) wrote :

Sounds like a good start. In the long term, it might be worth updating the tzcode programs to produce a file like /usr/share/javazi/ZoneInfoMappings but maybe in a format more amenable to memory mapped use in C programs.

Revision history for this message
Petr Machata (pmachata) wrote :

Zones mentioned in zone.tab form a superset of what's currently provided by pytz in common_timezones, because no links are in common_timezones at all. I think it may be actually more correct to include zones that are links, if such zones are in zone.tab: no backlinks are there, and all zones present there are zones belonging to countries (or crown dependencies, semi-autonomous regions, etc., but they do have ISO country code).

(Different wording of the same: common_timezones should contain all zones that the function country_timezones can return. Plus GMT, US/Eastern, et.al.)

Revision history for this message
RodrigoMoraes (rodrigo-moraes) wrote :

Hi!

This is somewhat related to the problem we have with pytz in Google App Engine. Because of the environment constraints and the short-lived runtime instances, initializing pytz takes an unreasonable amount of time in App Engine. Most of this is because pytz/__init__.py checks for 500+ timezone files when it is simply imported (this alone takes between 0.15 and 0.40 seconds in Google's server).

I wish those checkings were lazy, so that we could implement an alternative approach to check the available time zones. The way it is, we need to patch pytz to use in App Engine.

I started a project to provide a pytz version tuned for App Engine via monkeypatch, but realized that it'd be impossible to make it without actually patching __init__. The result is here:

http://code.google.com/p/gae-pytz/

Please let me know if I can help.

Revision history for this message
Petr Machata (pmachata) wrote :

Rodrigo,

I'm attaching the patch for what's described in comment #5. I've done some measurements, and the import takes less time than in the upstream package:
$ for i in `seq 1 50`; do PYTHONPATH=. python /home/petr/hle.py; done | python -c 'import sys; print sum (float (i) for i in sys.stdin)'
1.06896305084

With upstream pytz, I get:
$ for i in `seq 1 50`; do PYTHONPATH=. python -c 'import time;x=time.time();import pytz;print time.time()-x'; done | python -c 'import sys; print sum (float (i) for i in sys.stdin)'
3.13412761688

So that does help quite a bit. The code relies on system tzdata, meaning it might also be easier to keep it up to date.

Revision history for this message
Petr Machata (pmachata) wrote :

Bah, the first commandline should have been the same as the other. Makes essentially no difference in time.

Revision history for this message
RodrigoMoraes (rodrigo-moraes) wrote :

Petr,
I'm not sure I understand what you're doing there.

Checking all files in the module globals is what I'm trying to avoid. We zip the zoneinfo for appengine, because of limits in number of files. In the end this results in 500+ not raised IOErrors when pytz is imported (which alone takes .15 to .40 seconds). Without it, pytz loading is unnoticeable.

I'll try to make a patch to make the file list lazily loaded (and thus monkeypatchable).

Revision history for this message
Petr Machata (pmachata) wrote :

Instead of building a list of pre-defined timezones and then sifting through to discard any that are not on disk (i.e. doing 500+ opens, that's what's in the trunk), I build the list by traversing system hierarchy (i.e. doing a couple opens that don't end up in exceptions). Common zones are loaded from zone.tab. I don't really know what you try to achieve, so this may not help you at all.

I'm attaching a patch that defers the loading of all_timezones and all_timezones_set until the first access. It applies on top of the first patch. This cuts off another ~50% of the import time. I haven't tested that very much though, and I'm afraid whether the lazy collections successfully enough mimic their eager counterparts.

Stuart Bishop (stub)
description: updated
Revision history for this message
Stuart Bishop (stub) wrote :

I'll take a look at the lazier init patch now we no longer support Python 2.3 - writing a lazy set seemed impossible before for Python 2.3 - Python 2.6.

Another approach - since we are distributing the compiled zoneinfo database, we can add an extra datafile containing the information we currently generate at run time - the list of all installed timezones, information parsed from zone.tab. If the resource exists, use and trust it. If it does not, fall back to the existing methods (we need the fall back to support installations that make use of the system's compiled zoneinfo database).

Changed in python-tz (Ubuntu):
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
Stuart Bishop (stub) wrote :

This is fixed as much as we can fix it. pytz copes if zones it expects do not exist in the zoneinfo database. It does not notice extra zones added to the zoneinfo database.

Changed in pytz:
status: Confirmed → Fix Released
Revision history for this message
Robie Basak (racb) wrote :

Bug 1995864 has been reported and is relevant to this issue: I think Europe/Kyiv is now being set by users since a tzdata update in 22.04 provided it, but pytz (as shipped in 22.04) doesn't know about it. Any suggestions?

Benjamin Drung (bdrung)
Changed in pytz:
importance: Medium → Unknown
status: Fix Released → Unknown
Changed in pytz:
status: Unknown → New
Benjamin Drung (bdrung)
tags: added: rls-ll-incoming
Benjamin Drung (bdrung)
tags: added: fr-2987
tags: removed: rls-ll-incoming
Benjamin Drung (bdrung)
Changed in python-tz (Ubuntu):
status: Confirmed → Fix Committed
Revision history for this message
Benjamin Drung (bdrung) wrote :

Attached debdiff for kinetic SRU.

Revision history for this message
Benjamin Drung (bdrung) wrote :

Attached debdiff for jammy SRU.

Benjamin Drung (bdrung)
description: updated
tags: added: patch
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package python-tz - 2022.7.1-2

---------------
python-tz (2022.7.1-2) unstable; urgency=medium

  * Team upload.
  * Dynamically determine list of available and common timezones (LP: #207604)
  * Determine IANA (nee Olson) database version dynamically
  * Add autopkgtests to run unittest and own regression tests
  * Update homepage URL
  * Bump Standards-Version to 4.6.2

 -- Benjamin Drung <email address hidden> Tue, 21 Mar 2023 11:21:11 +0100

Changed in python-tz (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
Robie Basak (racb) wrote :

The fix seems appropriate for the development release - thanks to all involved for helping with this!

SRU review for proposed fixes for Jammy and Kinetic: could we be clear about what broken user experience we are seeking to fix here in Ubuntu Jammy and Kinetic specifically? Bug 1995864 is reported as fixed in Jammy. So what is the broken user story that we are fixing with this update?

Changed in python-tz (Ubuntu Jammy):
status: New → Incomplete
Changed in python-tz (Ubuntu Kinetic):
status: New → Incomplete
Revision history for this message
Benjamin Drung (bdrung) wrote (last edit ):

Everytime a new timezone is added to tzdata in an update, this timezone cannot be used in pytz with the hardcoded list of timezones. For jammy this applies to America/Ciudad_Juarez (when comparing pytz to tzdata 2023c).

Test case for jammy:

```
import pytz
pytz.timezone("America/Ciudad_Juarez")
```

This should not raise UnknownTimeZoneError.

Benjamin Drung (bdrung)
description: updated
description: updated
Revision history for this message
Robie Basak (racb) wrote : Please test proposed package

Hello James, or anyone else affected,

Accepted python-tz into kinetic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/python-tz/2022.2.1-1ubuntu0.22.10.0 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-kinetic to verification-done-kinetic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-kinetic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in python-tz (Ubuntu Kinetic):
status: Incomplete → Fix Committed
tags: added: verification-needed verification-needed-kinetic
Changed in python-tz (Ubuntu Jammy):
status: Incomplete → Fix Committed
tags: added: verification-needed-jammy
Revision history for this message
Robie Basak (racb) wrote :

Hello James, or anyone else affected,

Accepted python-tz into jammy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/python-tz/2022.1-1ubuntu0.22.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-jammy to verification-done-jammy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-jammy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (python-tz/2022.1-1ubuntu0.22.04.1)

All autopkgtests for the newly accepted python-tz (2022.1-1ubuntu0.22.04.1) for jammy have finished running.
The following regressions have been reported in tests triggered by the package:

dateparser/1.1.0-1 (amd64, armhf, ppc64el)
senlin/1:13.0.0-0ubuntu1 (amd64)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/jammy/update_excuses.html#python-tz

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (python-tz/2022.2.1-1ubuntu0.22.10.0)

All autopkgtests for the newly accepted python-tz (2022.2.1-1ubuntu0.22.10.0) for kinetic have finished running.
The following regressions have been reported in tests triggered by the package:

dateparser/1.1.1-1 (amd64, arm64, armhf, ppc64el)
kopanocore/8.7.0-7.1ubuntu11 (amd64)
senlin/1:14.0.0-0ubuntu1 (s390x)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/kinetic/update_excuses.html#python-tz

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Benjamin Drung (bdrung) wrote :

Retrying those autopkgtests solved the failures.

Verified python3-tz 2022.2.1-1ubuntu0.22.10.0 (kinetic) and 2022.1-1ubuntu0.22.04.1 (jammy):

```
$ python3
>>> import pytz
>>> pytz.timezone("America/Ciudad_Juarez")
<DstTzInfo 'America/Ciudad_Juarez' LMT-1 day, 16:54:00 STD>
```

The autopkgtests are successful:

kinetic:

* amd64: https://autopkgtest.ubuntu.com/results/autopkgtest-kinetic/kinetic/amd64/p/python-tz/20230331_193354_bcf83@/log.gz
* arm64: https://autopkgtest.ubuntu.com/results/autopkgtest-kinetic/kinetic/arm64/p/python-tz/20230401_041113_9b51a@/log.gz
* armhf: https://autopkgtest.ubuntu.com/results/autopkgtest-kinetic/kinetic/armhf/p/python-tz/20230331_193408_e22fd@/log.gz
* ppc64el: https://autopkgtest.ubuntu.com/results/autopkgtest-kinetic/kinetic/ppc64el/p/python-tz/20230331_191940_06a59@/log.gz
* s390x: https://autopkgtest.ubuntu.com/results/autopkgtest-kinetic/kinetic/s390x/p/python-tz/20230401_042402_5cfb8@/log.gz

jammy:

* amd64: https://autopkgtest.ubuntu.com/results/autopkgtest-jammy/jammy/amd64/p/python-tz/20230331_225012_10bc3@/log.gz
* arm64: https://autopkgtest.ubuntu.com/results/autopkgtest-jammy/jammy/arm64/p/python-tz/20230401_022638_8db57@/log.gz
* armhf: https://autopkgtest.ubuntu.com/results/autopkgtest-jammy/jammy/armhf/p/python-tz/20230331_200820_3a223@/log.gz
* ppc64el: https://autopkgtest.ubuntu.com/results/autopkgtest-jammy/jammy/ppc64el/p/python-tz/20230331_203522_2cd20@/log.gz
* s390x: https://autopkgtest.ubuntu.com/results/autopkgtest-jammy/jammy/s390x/p/python-tz/20230401_024737_ac0b3@/log.gz

tags: added: verification-done verification-done-jammy verification-done-kinetic
removed: verification-needed verification-needed-jammy verification-needed-kinetic
Revision history for this message
Brian Murray (brian-murray) wrote : Update Released

The verification of the Stable Release Update for python-tz has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package python-tz - 2022.1-1ubuntu0.22.04.1

---------------
python-tz (2022.1-1ubuntu0.22.04.1) jammy; urgency=medium

  * Dynamically determine list of available and common timezones (LP: #207604)
  * Determine IANA (nee Olson) database version dynamically
  * Add autopkgtests to run unittest and own regression tests

 -- Benjamin Drung <email address hidden> Tue, 21 Mar 2023 12:45:55 +0100

Changed in python-tz (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package python-tz - 2022.2.1-1ubuntu0.22.10.0

---------------
python-tz (2022.2.1-1ubuntu0.22.10.0) kinetic; urgency=medium

  * Dynamically determine list of available and common timezones (LP: #207604)
  * Determine IANA (nee Olson) database version dynamically
  * Add autopkgtests to run unittest and own regression tests

 -- Benjamin Drung <email address hidden> Tue, 21 Mar 2023 12:05:43 +0100

Changed in python-tz (Ubuntu Kinetic):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.