-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-48593: [C++] C++20: use standard calendar / timezone APIs #48601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
4283740 to
d82f990
Compare
|
It seems that std::chrono on GCC (14.3.0, 15.2.0) potentially has a bug that triggers some of our tests. Meanwhile std::chrono on MSVC 19.44 ( 14.44) appears to be pass them and is correct or at least consistent with vendored Below is the explanation and reproduction of the bug. // GCC libstdc++ DST bug reproduction
//
// The AN to AS Transition Bug
// ---------------------------
// Source https://github.com/eggert/tz/blob/c37fbc3249c1a1334948b38f3bca47dee5c11dd1/australasia#L165-L192
// Australia/Broken_Hill used the AN (New South Wales) rules until 2000, then
// switched to AS (South Australia) rules. Under AN rules, DST started the last
// Sunday of October and ended the last Sunday of March. For the 1999-2000
// summer, DST started October 31, 1999 and would end March 26, 2000. February
// 29, 2000 falls squarely within this DST period, so the correct offset should
// be 9:30 base + 1:00 DST = 10:30 (630 minutes).
//
// Why GCC's Data is Wrong
// -----------------------
// When libstdc++ processes the zone transition from AN rules to AS rules (which
// happens in year 2000), it appears to lose or reset the DST state inherited
// from the AN rules. Instead of recognizing that DST is still active from the
// October 1999 transition, it reports offset=570 (just the 9:30 base) with
// save=0. The inconsistency is evident: it returns abbrev="ACDT" (daylight
// time) but the offset and save values indicate standard time. The AN rules
// clearly show DST should be active until the last Sunday of March 2000.
//
// Compile: g++ -std=c++20 -o gcc_dst_bug gcc_libstdcxx_dst_bug.cpp
// Expected: 630 (10:30 = 9:30 base + 1:00 DST)
// Actual: 570 (9:30 = base only, DST missing)
#include <chrono>
#include <iostream>
int main() {
using namespace std::chrono;
auto* tz = locate_zone("Australia/Broken_Hill");
auto info = tz->get_info(sys_days{2000y / February / 29d} + 23h + 23min + 23s);
std::cout << duration_cast<minutes>(info.offset).count() << "\n";
} |
Is the bug reported somewhere? If not, can you do that? |
Yes, I think we should do so. There are also a bunch of code snippets in the C++ and Python codebase that could be removed, IIRC. |
It seems to be related to a known issue, I added comment explaining our case. |
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, this is looking very good now. @raulcd Do you want to make a final pass over this?
|
@github-actions crossbow submit -g cpp |
|
Revision: 32fbc78 Submitted crossbow builds: ursacomputing/crossbow @ actions-1a8d03ac7f |
raulcd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good overall to me. I think there's more than the gcc bug on some of these issues found, see my other review comment.
Nonetheless I am unsure how can we fight the timezone database nuances on those where they are not consistent, historical timezone changes, etcetera, where there's no agreement between different timezone databases.
python/pyarrow/tests/test_compute.py
Outdated
|
|
||
|
|
||
| # TODO(GH-48743): Re-enable when Windows timezone issues are resolved | ||
| # https://github.com/apache/arrow/issues/48743 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused by this comment, on that specific job failure we use:
-- The C compiler identification is MSVC 19.44.35222.0
-- The CXX compiler identification is MSVC 19.44.35222.0
not GCC. Isn't this related to differences between historical timezones depending on the timezone database used?
|
|
||
| # Download database | ||
| curl https://data.iana.org/time-zones/releases/tzdata2024b.tar.gz --output ~/Downloads/tzdata.tar.gz | ||
| curl https://data.iana.org/time-zones/tzdata-latest.tar.gz --output ~/Downloads/tzdata.tar.gz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing here and elsewhere to latest.
|
Thanks for review @raulcd! I pushed a couple of changes since. Nothing major, but still. I'll do another pass tomorrow to make sure all TODOs point to the right issues. |
| # GH-45295: For ORC, try to populate TZDIR env var from tzdata package resource | ||
| # path. | ||
| # | ||
| # Note this is a different kind of database than what we allow to be set by | ||
| # `PYARROW_TZDATA_PATH` and passed to set_timezone_db_path. | ||
| if sys.platform == 'win32': | ||
| if os.environ.get('TZDIR', None) is None: | ||
| from importlib import resources | ||
| try: | ||
| os.environ['TZDIR'] = os.path.join(resources.files('tzdata'), 'zoneinfo') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a comment on the docs, you mentioned keeping the note about ORC and having to set TZDIR, but then here it is apparently not needed for our tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect we don't really test for this as I believe ORC still needs TZDIR, see #48601 (review). I'll do another check tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If your testing uses conda environments, then orc will pick up the tzdata from the environment: apache/orc#1882
Rationale for this change
Switch to std::chrono for MSVC to be able to use the system-provided timezone automatically on Windows.
What changes are included in this PR?
This adds
chrono_internal.hthat uses C++20 std::chrono timezone/calendar APIs on compilers with support (MSVC only for now) and falls back to vendoreddate.hotherwise.Are these changes tested?
Partially tested locally and partially to be tested on CI.
Are there any user-facing changes?
Yes, Windows users will no longer need to install the IANA tzdb (see instructions here and here). We possibly have tzdb download set up in CI too and should update it appropriately.