19:01:07 #startmeeting 19:01:07 Meeting started Thu Aug 2 19:01:07 2018 UTC. The chair is wumpus. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:07 Useful Commands: #action #agreed #help #info #idea #link #topic. 19:01:12 hi! 19:01:31 hi 19:01:33 hi 19:01:35 hi 19:01:42 hi 19:02:03 #bitcoin-core-dev Meeting: wumpus sipa gmaxwell jonasschnelli morcos luke-jr btcdrak sdaftuar jtimon cfields petertodd kanzure bluematt instagibbs phantomcircuit codeshark michagogo marcofalke paveljanik NicolasDorier jl2012 achow101 meshcollider jnewbery maaku fanquake promag provoostenator 19:02:15 hi. 19:02:19 hi 19:02:23 Hi 19:02:34 ello 19:03:10 Hi. 19:03:24 topics? 19:04:08 crickets 19:04:31 crickets are... good I guess 19:04:52 0.17 PRs: https://github.com/bitcoin/bitcoin/pulls?q=is%3Aopen+is%3Apr+milestone%3A0.17.0 19:05:22 0.17 issues: https://github.com/bitcoin/bitcoin/issues?q=is%3Aopen+is%3Aissue+milestone%3A0.17.0 19:05:52 I guess we could discuss the CXXFLAGS stuff 19:05:56 0.17 release schedule: #12624 19:05:57 https://github.com/bitcoin/bitcoin/issues/12624 | Release schedule for 0.17.0 · Issue #12624 · bitcoin/bitcoin · GitHub 19:06:00 I don't really have a good solution for it 19:06:28 #topic CXXFLAGS stuff 19:06:35 Whats the issue? 19:06:58 gmaxwell: autotools forces user CXXFLAGS after our own; so when the user builds with -mno-avx2, the build simply fails 19:07:02 #13789 19:07:04 https://github.com/bitcoin/bitcoin/issues/13789 | crypto/sha256: Use pragmas to enforce necessary intrinsics for GCC and Clang by luke-jr · Pull Request #13789 · bitcoin/bitcoin · GitHub 19:07:19 eh? 19:07:29 it doesn't force it, we do that ourselves 19:07:35 can someone please drop the registed users +q for now? sdaftuar is muted. 19:07:35 ie, autotools calls the compiler with our -mavx2 *followed by* -mno-avx2 19:07:40 well apparaently there's a problem where the build system passes the wrong flags, and luke-jr's PR works around it with pragmas 19:07:43 or at least voice sdaftuar 19:07:43 that seems wrong to me 19:07:50 the intention is for any user-passed flags to be able to override the ones we add. If that's not the case, it's a bug. 19:07:59 cfields: that's the problem in this case 19:08:06 we can't let the user override -mavx2 here 19:08:09 hi 19:08:23 luke-jr: you mean we shouldn't, or we currently don't? 19:08:43 cfields: autotools makes it impossible to override user CXXFLAGS, but for these files we must or we fail to compile 19:08:45 luke-jr: why would the user ever pass -mno-avx2 in the first place? We shouldn't be using -mavx except on special files that need it to compile. 19:08:47 oh, I see what you mean 19:08:57 gmaxwell: to avoid AVX2 instructions 19:09:02 luke-jr: eh? 19:09:07 gmaxwell: it's those special files that fail to compile 19:09:16 hi! 19:09:26 luke-jr: fail how? do we just need to check for more intrinsics? 19:09:31 (I guess it was crickets and the muffled voice of sdaftuar in the dinstance) 19:09:39 luke-jr: If the user wants to avoid executing avx2 instructions they need do nothing. They don't have to pass any special options. 19:09:59 cfields: he is saying that if he builds with CXXFLAGS=-mno-avx2 the compile fails. 19:10:10 gmaxwell: AVX2 is enabled by default with some -march options 19:10:12 gmaxwell: I assume the issue is some failures to compile because of a busted compiler, so there's a desire to be able to avoid them entirely. 19:10:21 https://github.com/bitcoin/bitcoin/issues/13758 19:10:38 luke-jr: why would anyone -march= then -mno-avx2? that just seems busted. 19:10:38 ooooooh 19:11:04 it looks like a really contrives scenario to me 19:11:04 gmaxwell: I'm not sure why laurentb is doing it, but there's no reason they shouldn't be able to 19:11:22 in any case, why not detect the -mno-avx2 and then don't even compile the file? 19:11:24 not worth it polluting the code with all kinds of compiler specific pragmas at least 19:11:25 ok, there are plenty of ways to solve that. I think we can just discuss on the github issue? 19:11:35 gmaxwell: autotools says we're supposed to allow changing CXXFLAGS after configure 19:11:43 agree with cfields , there's no hurry with this 19:11:44 make CXXFLAGS=… 19:11:47 okay 19:12:03 in any case -march= -mno-avx2 sounds like a misguided thing that we shouldn't take a lot of complexity to support, unless someone knows otherwise. 19:12:09 gmaxwell: we turn off all flags except the ones we're testing when doing the autoconf checks so that they don't cause unrelated errors. 19:12:10 Esp because there is -mtune 19:12:27 cfields: no we don't 19:12:38 Unassigned the issues from the 0.17 milestone for now 19:12:42 luke-jr: ones that we add, we do 19:13:02 MarcoFalke: it's a must-fix for 0.17 19:13:22 no, it's not a must-fix for 0.17 19:13:26 I don't see how this is a must fix. 19:13:33 agree fully with MarcoFalke 19:13:36 broken build system.. 19:13:48 any other topics? 19:13:51 "User sets weird options which seem to make no sense as we know, and then something that arguably should work but fails" is not really blocker material. 19:13:59 Lol 19:14:00 Windows 19:14:03 (topic) 19:14:04 I left the pulls for review in the 0.17 milestone, so if reviewers like them, they can still be merged 19:14:21 provoostenator: what about Windows? :p 19:14:21 As in: do we want to fix the Windows unicode stuff, given that there's still two weeks? 19:14:25 right, it's annoying for the specific user, but if you have really specific needs like that you can patch around it 19:14:35 Let's do the unicode stuff for 0.18 19:14:36 #topic Windows (provoostenator) 19:14:37 I think the opinion in the ticket was no. 19:14:51 It would require a leveldb bump and major changes 19:14:55 wumpus: we should find out what the user is doing, might just be some greater confusion... like they want to benchmark each way or something. 19:15:05 Not sure if we can review and test that in such a short time frame 19:15:25 Ok, and we're not getting tons of reports about this either? 19:15:31 Is the unicode stuff just about filenames (wallet)? 19:15:32 gmaxwell: right! 19:15:43 I am looking into restoring the proper warning for -wallet=non-ascii, but that should be all for 0.17 19:15:51 jonasschnelli: Also datadir 19:15:52 only half here, but feel free to ping me 19:15:58 I think it's mostly filename yes, but also labels. 19:16:30 Yeah. We should fix that. But this seems to be open since forever and I don't see a pressing need for 0.17 19:16:33 But afaik I know it works if your system locale is set "correctly". 19:16:53 jonasschnelli: Indeed, anything that is not a regression should go into 0.18 19:17:04 provoostenator: due to the nature of the bug, I think many of the people who would be reporting it may not speak english. So the significance may be a little under-represented. 19:17:12 s/bug/issue/ 19:17:16 I was trying to say what cfields just said. 19:17:22 Good point 19:17:29 Yes. Good point. 19:17:35 provoostenator: wait, labels are broken? 19:17:37 We shouldn't take few reports to mean few issues... but it the change is invasive and not ready, and the issue isn't new... 19:17:58 gmaxwell: agreed. -1 from me as well. Just wanted to throw that out there. 19:17:59 luke-jr: not sure, try with an english system locale and then adding Chinese labels, I can't remember if that works. 19:18:14 provoostenator: last I checked, we had functional tests for non-English labels :/ 19:18:17 But with a Chinese system locale it does work afaik, hence it doesn't seem super urgent. 19:18:24 I think it's a serious issue 19:18:27 We can backport to 0.17.1, if it qualifies as bug fix 19:18:28 Functional tests that run on linux. 19:18:36 but it's risky to do this last minute 19:18:42 and it's not a regression 19:18:44 MarcoFalke: +1 19:19:05 MarcoFalke: agree 19:19:08 Maybe just merge it into master after the 0.17 split so it gets testing quickly. Then we could always backport it if there's strong demand? 19:19:14 it's a bug, maybe one not worth backporting depending on how tidy the fix is. 19:19:37 it's unfortunate that this requires such invasive changes for windows 19:19:52 specific changes not required for other OSes 19:19:59 This is really hard to fix. 19:20:03 which means most of use cannot test it usefully 19:20:24 can it be reproduced in WINE? 19:20:43 luke-jr: I haven't tried, I just use a virtual box with Windows 10 19:20:50 I don't think we should fall back to WINE for testing that issue 19:21:16 Someone recently offered to run and maintain Windows integration builds 19:21:17 It doesn't sound like WINE would have the same issue tbh 19:21:27 Health professions recommend against trying to use wine to solve your problems. 19:21:49 careful we don't spiral to homebrew... 19:21:50 This really needs to be tested on native Windows (Not against testing it in wine additionally, though) 19:22:06 yes 19:22:18 #12613 19:22:19 https://github.com/bitcoin/bitcoin/issues/12613 | [CI] Adding MSVC build to CI check with Appveyor · Issue #12613 · bitcoin/bitcoin · GitHub 19:22:36 If there is a way to run functionfal test on Window 19:22:43 MSVC is a orthogonal issue, I think, to solving this in the gitian builds 19:22:48 I occasionally run them on appveyro 19:23:10 MarcoFalke: it'd be nice to have Travis test it in the meantime 19:23:39 well not orthogonal but I wouldn't be surprised if there are differences mingw versus MSVC in unicoode handling 19:23:40 Sure, if you can get this to work on travis, why not 19:24:07 ken2812221: I think I also got them to run on my windows vm once 19:24:21 gmaxwell: this is one of the cases where you'd recommend stronger liquior instead. 19:24:38 MSVC adds a unicode version of fstream, but mingw does not have that. 19:24:52 MarcoFalke: "once" sounds promising ;) 19:25:16 I could try once more and then write down the steps I did, heh 19:25:59 ken2812221: huh, we could potentially add it and upstream to mingw, then. 19:26:52 why does utf-8 need a special fstream 19:26:54 I'm confused 19:27:06 why does this have to be so messed up 19:27:16 wumpus: just filename 19:27:33 It's fine to read the stream 19:27:34 can't we just drop windows support 19:27:38 * wumpus ducks 19:27:41 Lol 19:28:02 Maybe run it inside at little Linux VM? Can't be worse than Electron :-) 19:28:07 yesss 19:28:21 windows 10 already includes ubuntu right 19:28:29 it's messed up because windows can still run old windows crap and they have to support old api forever, including all the crappy api they built when the company was on the rocks before they discovered scm 19:28:37 let's dump the win32 garbage and use that... 19:28:41 lol. really with the proposed builder stuff, would building a whole linux system really be worse? :P 19:28:50 wumpus: Doesn't a brand new win10 update add a bunch of compat stuff that would avoid these issues? I don't think it'd be crazy to consider dropping support for anything less than that reasonably soon. 19:29:04 cfields: it does! that's true 19:29:05 Not by default, but you can install it. I once did the inception thing: Gitian builder VM inside Ubuntu inside Windows inside Virtual Box on Mac. It crashed. 19:29:27 cfields: I dont know the details, but my understanding is that a lot of people don't want to run windows 10 due to integrated adware or something? 19:29:56 gmaxwell: the latest update, at least for me, even forced me to allow analytics data 19:30:07 whoa 19:30:17 Yeah it's much more invasive 19:30:27 They're definately going for the get hacked by random people on the internet or get spied on by us sales pitch. 19:30:38 gmaxwell: I suppose those people would be used to running outdated versions of things, then :\ 19:30:58 (or both) 19:31:22 ugh 19:31:35 then again, this is not a regression 19:31:40 what works on windows, works. 19:31:56 thats a possibility too, and better than dropping support for windows... 19:32:07 the question is whether we should change 10000 lines just to accomodate unicode in it 19:32:38 It seems really surprising that we'd need to change more than some wrapper functions. 19:32:45 ^ 19:32:55 yes it seems a design error in the first place if this cannot be wrapped somehow 19:33:27 agreed. And that's a reasonable thing to fix, imo. IIRC it also means we have trouble adding filesystem sandboxing. 19:33:28 I do think ken2812221's PR is fairly ok in that regard 19:34:30 yes the reason I did the fs:: stuff is to allow for filesystem sandboxing 19:35:43 wumpus: more is required though, right? I assume something along the lines of ken2812221's PR would help? 19:36:55 cfields: I had it down to only changes in fs.h and adding a fs.cpp, afaik 19:37:08 ah, ok. nm then. 19:37:38 but maybe more is needed now that the code evolved further... 19:38:07 I hope we can find a set of changes that a reasonable independant of windows, so that the windows fix is just a couple files changed. 19:38:25 I think my main drawback in ken2812221 's PR is that it changes .string to stringu8 or something, which seems unnecessary, all strings should be utf-8 19:38:48 if you make sure the wrapper does that you don't need to change it everywhere in the code 19:39:42 wunpus: That's how boost's path work. It need to pass a utf8 converter. 19:40:04 ken2812221: yes... 19:40:21 but our own wrapper could do that automatically 19:40:23 Actually I have thought that before, but it need to patch boost. 19:40:36 didn't we want to get rid of boost anyway? 19:40:39 or wrap boost 19:40:40 maybe this is a good opportunity 19:40:47 we'll do so, in time 19:41:12 yes 19:41:21 any other topics? 19:41:58 Topic suggestion: Leveldb FD usage on x86_64 19:42:59 #topic leveldb FD usage on x86_64 (gmaxwell) 19:43:13 There was a recent report from a user hitting the select limit on his x86_64 linux host. Inspection with lsof shows that leveldb is using a lot of FDs on nodes where we expected it to be mostly using mmap. Apparently leveldb has a number of mmaps limit, as far as I know there isn't any reason we shouldn't increase it. 19:43:47 (seperately we should move to using poll ... but increasing the mmap limit should be a ~1 line change, unless someone knows a reason to not do so) 19:44:23 I think this limit was increased recently 19:44:38 specifically for 64-bit OSes, as it was deemed to be no problem 19:44:51 of course, I'm not surprised it turns out to be it is... :/ 19:45:27 the reason to increase it was something with performance 19:46:03 #12495 maybe? 19:46:06 https://github.com/bitcoin/bitcoin/issues/12495 | Increase LevelDB max_open_files by eklitzke · Pull Request #12495 · bitcoin/bitcoin · GitHub 19:46:21 "utACK ccedbaf - with the comment that I think relying on undocumented behavior of leveldb is risky. " 19:46:35 yeah, of course this is going to bite us in the ass 19:47:25 so revert? 19:47:33 upstream has updated a bunch of stuff lately (namely moving to c++11 by default and dropping the need for lots of platform-dependent code). Has this been addressed as well, by any chance? 19:48:08 wumpus: hm so reading this it sounds to me that there is a seperate limit of 1000 mmaps. And running into that limit is what is causing more FDs to be used than we expected. 19:48:23 wumpus: so it might be possible to not revert that patch, but instead increase the maximum mmaps. 19:48:24 (that bump is going to be painful, btw) 19:49:21 gmaxwell: I like switching to anything else than select() though! 19:49:29 we've been stuck with select with no good reason 19:49:32 wumpus: yes, I think we should do that too! 19:49:51 gmaxwell: that would explain why i saw exactly 999 mmaps 19:49:54 my reason used to be, well, we're going to switch to using libevent for P2P any time now! 19:49:58 BlueMatt: and phantomcircuit: both previously had private patches to change to poll. 19:49:58 but that's been years 19:50:01 let's just do it 19:50:17 wumpus: any time now! 19:50:19 change to poll() for unix-ish OSes 19:50:24 sipa: my comments above about the map limit are based on ossifrage (reporter's) work, and some of the comments in that PR above. 19:50:30 wumpus: sorry :( 19:51:05 In any case, I think there are two issues: (1) switch to poll (duh but can we do it for 0.17) and (2) leveldb should probably be allowed more maps unless we know some reason to not do that. 19:51:06 cfields: nooOO sorry I didn't mean it as an attack on you, I could have done it, many other people could have done it, but we didn't 19:51:24 and maybe (3) potentially revert that PR; but given my understanding, I don't think we need to. 19:51:49 Leveldb being map limited is probably going to end up bad for performance even if we no longer had the FD issues. 19:51:49 man mmap doesn't talk about a specific mmap limit 19:52:10 there is no mmap limit afaik 19:52:15 ^ 19:52:38 Thats where my concern about increasing leveldb's mmap usage comes from-- I dont understand why the limit is there. 19:52:41 there's definitely something that limits it. Some unrelated sysctl ? 19:52:49 I know i've seen it documented somewhere. 19:52:49 Unless they're trying to reduce TLB thrashing or something. 19:54:41 cfields: for 0.18 we should definitely update leveldb though 19:54:55 I bumped my mmap file limit up to 4k, currently 1180 ldb files are mapped and it isn't using any fds for ldb files 19:54:57 https://github.com/bitcoin/bitcoin/pull/12495#issuecomment-367815005 < eklitzke suggests increasing leveldb's mmap limit here. ossifrage also apparently did that and reported it fixed his problems, though his specific problems are likely better fixed by switching to poll. 19:55:04 for 0.17 I'd propose to keep it like this, not do anything wild... 19:55:13 cfields: if i read correctly, the internet says sysctl vm.max_map_count 19:55:40 I thought it was probably too late to switch 0.17 to poll, but increasing the leveldb map count seems like an easy mitigation that probably improves performance. 19:56:01 [gmaxwell@bean ~]$ sysctl vm.max_map_count 19:56:01 vm.max_map_count = 65530 19:56:12 is there a per-process one? 19:56:35 vm.max_map_count = 65530 (same on my box, but that number seems a bit low to be a global count) 19:56:37 gmaxwell: I'd be really nervous about that. There are so many specific quirks related to select/poll/epoll/kqueue 19:57:06 It is per process 19:57:19 (not against the change at all, only hesitant for the 0.17 cycle) 19:57:34 vm.max_map_count = 65530 19:57:44 vm.max_map_count = 65530 19:57:46 checked on 3 linux boxes, same output 19:57:54 likewise. 19:57:54 I... strongly doubt this is an issue 19:58:24 So it might be prudent for 0.17 to bump the leveldb max to 4k or something. 19:58:32 if you manage to map 65000+ files, well, I"m not surprised you run into trouble 19:58:45 I still just wish I better understood why they had that 1000 default. 19:59:01 what happens if the limit is hit? 19:59:28 gmaxwell, mine only worked on linux and was actually just broken on windows 20:00:05 *dong* 20:00:19 #endmeeting