19:01:43 #startmeeting 19:01:43 Meeting started Thu May 18 19:01:43 2017 UTC. The chair is wumpus. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:43 Useful Commands: #action #agreed #help #info #idea #link #topic. 19:02:10 hi. 19:02:17 yow 19:02:26 hi 19:02:26 I just have one topic for today, but I'll let others suggest theirs 19:02:34 [13bitcoin] 15laanwj pushed 2 new commits to 06master: 02https://github.com/bitcoin/bitcoin/compare/28c6e8d71b3a...ea6fde3f1d26 19:02:34 13bitcoin/06master 14618d07f 15Jorge Timón: MOVEONLY: tx functions to consensus/tx_verify.o... 19:02:35 13bitcoin/06master 14ea6fde3 15Wladimir J. van der Laan: Merge #8329: Consensus: MOVEONLY: Move functions for tx verification... 19:02:39 topics? 19:02:49 #topic clientside filtering 19:02:55 ack 19:03:26 BIP148 19:03:33 (after clientside filtering etc) 19:03:50 I don't think that works CodeShark, I think only the chair can set the topic 19:03:55 #topic clientside filtering 19:03:58 :) 19:04:19 so there are several filtering options with different performance tradeoffs 19:04:40 bloom filters have been typically considered - but there are some other ideas that might be worth considering 19:04:56 Filter for BDF, read gmaxwell's reply: https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2016-May/012637.html 19:05:00 roasbeef has worked on an idea based on golomb coded sets 19:05:27 «he most efficient data structure is similar to a bloom filter, but you use more bits and only one hash function. The result will be mostly zero bits. Then you entropy code it using RLE+Rice coding or an optimal binomial packer (e.g. https://people.xiph.org/~greg/binomial_codec.c).» 19:05:55 yes? 19:05:56 gcs sacrifices CPU for space 19:06:14 I think what we would need is data about the filter size for the last 100000 blocks... 19:06:16 filters are smaller, but queries are more computationally expensive 19:06:19 CodeShark: CPU for who when is always the question. 19:06:24 jonasschnelli: I have that 19:06:32 roasbeef: Oo... share? 19:06:33 hey, roasbeef! :) 19:07:01 what BIP37 does is very cpu expensive for the serving party, which is why it leads to dos attacks. 19:07:17 with any of the map based proposals that goes away and the cost to construct is not very relevant. 19:07:24 constructing a gcs isn't very computationally expensive 19:07:38 more so than bip37 19:07:43 Similarly, cost to lookup is not very relevant, the reciever will decode one per block. 19:07:48 the queries are a little more computationally expensive than bloom filters, but that is done on client 19:07:51 jonasschnelli: i have a csv file of stats for the entire chain, can easily get the last 100k out of it, the csv file itself is 14MB 19:07:57 sipa: maybe the lots of hash functions make it more expensive than you might guess. 19:08:06 roasbeef: I take the complete one,. thanks. :) 19:08:14 but gcs only needs to be computed once per block 19:08:29 CodeShark: do you suggest this as something that blocks commit to? 19:08:36 or something that a full node would precompute and store? 19:08:42 with bloom filters, there are several hash functions, with the gcs based approach, there's a single hash function. but the set itself is compressed, so you need to decompress as you query 19:08:43 the latter for starters 19:08:44 i suppose the last 19:08:46 precomp and store 19:08:50 sipe: something a node would precompute and store, to start 19:08:58 okay 19:09:19 what would be stored in the set? 19:09:23 I'm dubious that we'd get state of the art performance from golomb coding, but interested to see. 19:09:26 Can be done after the block has been connected 19:10:04 sipa: I believe the discussion is the 'bloom map' proposal. 19:10:13 roasbeef was suggesting two filters - one for super lightweight clients, another for clients that require more sophisticated queries 19:10:40 What are the differences? The tx template types? 19:10:46 the former would only encode UTXOs, the latter would also encode witness data 19:10:56 encode witness data?! 19:11:16 well, if you want to query for whether a particular execution path has been taken - necessary for things like lightning 19:11:32 basic has: outpoints, script data pushes. extended has: witness stack, sig script data pushes, txids 19:11:46 but do you need to _search_ based on witness data? 19:11:51 i understand you may want to see it 19:12:01 but you know what UTXOs to query for, no? 19:12:35 I'm guessing revocation enforcement might be outsourced to nodes that cannot know the exact transaction format - only some key 19:12:50 roasbeef, wanna comment? 19:12:53 Yes, requesting it is fine, searching on it? Be careful: it has serious long term implications if you expect that data will even be readily available. I am doubtful five years from now most nodes will have any witness data from more than a year back. 19:13:22 (witness data also means non-utxo transaction data in that above comment) 19:13:54 aside, I'm glad to hear this discussion has moved past just replicating the BIP37 mechenism. 19:13:58 rationale to include witness data was to allow light cleitns to efficielty scan for things like reusable addresses (stealth addresses), i think my model of how folks do that on-chain these days is dated thoughu, i guess they stuff a notification on Op_returns? 19:14:19 i'm not sure that is worth the cost 19:14:30 also, individual scriptPubKey pushes? 19:14:46 if anything, my preference would just be outpoints and full scriptPubKeys 19:14:50 they do make the extended filters quite a bit bigger (i have testnet data also) 19:14:54 well no one does those things in practice, and everyone who previously has implemented them that I'm aware of performed all scanning via a centeralized server, even though they could have matched on the OP_RETURN. 19:15:23 we can always start with the simplest minimal filter and then add more if we find use cases 19:15:24 gmaxwell: well the intention was to allow the new light client mode to actually make using them pratcical without delegating to a central server 19:15:41 roasbeef: that was already possible with BIP37 and the prior design. 19:15:47 Can we start with adding the same elements that bip37 does? 19:15:48 sipa: so including the op-codes? 19:16:02 Usuabilty of SPV clients that scan using BIP37 is really poor though, thus the rise of electrum. 19:16:04 roasbeef: bah, and 1) further encourage op_returns and 2) make them even more expensive for full nodes? 19:16:26 jonasschnelli: the things BIP37 added largely turned out to be a mistake that really degraded BIP37 so I hope a new proposal would do less. 19:16:40 well the degradation problem doesn't exist here 19:16:47 as the filter is not cumulative 19:17:02 sipa: is there a way to do it without OP_RETURN? 19:17:05 yes, but you still need a bigger filter for same FP ratio. It's just less awful. :) 19:17:14 luke-jr: sure, payment protocol like systems 19:17:27 well, true, but then you don't need the crypto stuff for it 19:17:42 i think that's a separate discussion and probably not one for here 19:17:48 k 19:17:58 for starters we should look at the most basic use cases 19:18:00 Yea, we should have a subcommittee. :P 19:18:07 jonasschnelli, CodeShark, roasbeef: is there a use case for individual pushes in scriptPubKeys? 19:18:13 the action is probably define a set of filter and create a spec that leaves room for future filter types 19:18:22 jonasschnelli: indeed 19:18:30 especially in a world where everything is P2PKH/P2SH/P2WPKH/P2WSH 19:18:38 once we have the framework for adding new filters, it should be easy to do 19:18:43 jonasschnelli: multiple filter types can result in n-fold overhead, which will be a significant pressure against defining many. 19:18:47 sipa: sure, the filter is smaller if one doesn't include the op-code as well 19:19:00 roasbeef: eh? 19:19:26 i must be misunderstanding something then 19:19:27 oh you mean insert the _entire_ thing 19:19:36 yes, just the whole scriptPubKey 19:19:41 1 element per output 19:19:59 well, and another one for the outpoint 19:20:08 mhmm, only advtange to data pushes in that case is in a world where mbare multi-sig is actually used 19:20:16 sipa: wait why? 19:20:23 gmaxwell: why what? 19:20:27 roasbeef: yes, which we don't expect that world to exist. 19:20:51 roasbeef: yes, the reason it's in BIP37 is for bare multisig support... but i don't think that's very interesting now 19:20:52 sipa: I expect one insert per output. The scriptpubkey. Why would you insert anything else (for normal functionality) 19:21:06 s/now/ever/ but hindsight is 20/20 19:21:18 blockchain isn't a message bus. :P 19:21:19 i guess if you want to look for an outpoint, you can always search for its scriptPubKey 19:21:26 sipa: right. 19:21:27 okay. 19:21:58 in BIP37 there was a reason to separate it, as it would be less bandwidth if you wanted a specific coutpoint, despite there being many scriptPubKeys with it 19:22:08 but here, that reason doesn't really matter i think? 19:22:30 roasbeef: what do you think? just a filter with scriptPubKeys? 19:22:33 sipa: the privacy leak from correlated data still exists in map proposals, based on what blocks you choose to scan further, though much less severe than BIP37. Keep that in mind. 19:22:58 if it's just spk's, then how does one query the filters to see if an outoint has been spent? 19:23:35 roasbeef: by querying for the scriptPubKey that outpoint created 19:23:41 roasbeef: which you will always know, i think? 19:23:55 roasbeef: by looking for its spk. 19:24:09 sipa: which would require adding parts of the witness/sigScript though? 19:24:14 ? 19:24:26 i'm confused 19:24:31 me too :) 19:24:38 txhash:txindex -> scriptPubKey 19:24:38 maybe we should do this outside of the meeting 19:24:39 roasbeef: has nothing to do with the witness. You validate the transaction, you know the content of the outpoint. 19:24:51 it seems we're doing protocol design here now 19:25:03 12:17 < gmaxwell> Yea, we should have a subcommittee. :P 19:25:17 anyhow, we don't need to decide the specifics of what goes in the filter right now 19:25:22 agree 19:25:27 ok, sure, to summarize: we have working code for the construction, have nearly finished integrating it into lnd, have a BIP draft that should be ready by next week-ish (will also integrate feedback from thjis discussion) 19:25:31 I like the idea of creating a framework that allows us to arbitrarily define filters later on 19:25:32 i think it's an interesting thing to research further 19:25:41 not sure what else needs to be discussed here 19:25:41 well we aren't deciding anything right now... :) 19:25:43 CodeShark: I do not. 19:26:00 BTW: kallewoof has an draft impl. on serving filters over the p2p (though bloom): https://github.com/kallewoof/bitcoin/pull/1/files (in case someone wants to drive this further) 19:26:05 CodeShark: there is an n-fold cost to additional filters. It is unlikely to me that nodes would be willing to carry arbritarily many in the future. 19:26:14 CodeShark: there might be a reasonable case for more than one, sure. 19:26:56 In any case, I think this is good to open up more discussion and participation. 19:27:09 I'm quite happy to hear that there is activity in this area and I'd like to help. 19:27:10 gmaxwell: I see this point but I don't think it would hurt if the specs would allow new filter types 19:27:13 gmaxwell: point is the code complexity to support adding arbitrary filters isn't that great and it avoids the bikeshed in writing up the initial BIP ;) 19:27:30 jonasschnelli: yea sure, whatever, but thats just a type paramter. 19:27:40 gmaxwell: right. 19:28:12 end of topic? 19:28:13 * roasbeef now uunderstands what sipa was referring to 19:28:31 I don't think any other have been proposed? 19:28:47 you're gonna regret saying that.. :P 19:29:07 quick: high priority PRs. 19:29:09 nearly halfway time 19:29:14 kallewoof had also an approch that peers could serve digests of filters to check the integrity among different peers 19:29:15 #topic high priority PRs 19:29:33 small topic for later: bytes_serialized 19:29:34 Congrats Morcos on the merge of the new fee estimator stuff. 19:29:43 \o/ 19:29:45 it will need cleanups, but that's fine 19:29:56 thanks, quick PSA.. if you run master now it'll blow away your old fee estimates, you might want to make a copy 19:30:01 quite a few high priority PRs were merged this week, so there's place for new ones, please speak up if there's any that block further work for you 19:30:04 "micros" not withstanding. 19:30:17 i'm hoping to get an improvment which makes the transition more seamless before 0.15 19:30:46 sipa: i'm basically done reviewing per-txout (#10195), looks awesome! running some benchmarks now. 19:30:50 https://github.com/bitcoin/bitcoin/issues/10195 | Switch chainstate db and cache to per-txout model by sipa · Pull Request #10195 · bitcoin/bitcoin · GitHub 19:30:57 sdaftuar: thank you so much 19:31:15 I've been testing per-txout. Survived a few crashes so far. 19:31:31 I've been testing #10195 for a while, haven't run into any problems 19:31:35 https://github.com/bitcoin/bitcoin/issues/10195 | Switch chainstate db and cache to per-txout model by sipa · Pull Request #10195 · bitcoin/bitcoin · GitHub 19:31:35 morcos, dont look now but it's being used in anger on multiple large wallet services :) 19:31:46 instagibbs: "in anger" ? 19:31:55 "doing it live" 19:32:02 "hold my beer" 19:32:30 heh.. fools, the whole reason to merge it into master was to get it some more testing 19:32:35 luke-jr: have you done the multiwallet rebasing? 19:32:44 there's not many explicit acks on https://github.com/bitcoin/bitcoin/pull/10339 19:32:48 I didn't realise jtimon's PR was merged? 19:32:49 morcos, well, other services were doing crazy things.. (ok enough off-topic) 19:33:03 luke-jr: which one? 19:33:05 so, ok, any new ones? 19:33:17 jtimon: args refactor 19:33:17 i'd like more review on #10295, it is blocking my ipc prs 19:33:19 https://github.com/bitcoin/bitcoin/issues/10295 | [qt] Move some WalletModel functions into CWallet by ryanofsky · Pull Request #10295 · bitcoin/bitcoin · GitHub 19:33:27 ryanofsky: ack, i started reviewing that 19:33:32 I have added #10240 today 19:33:34 https://github.com/bitcoin/bitcoin/issues/10240 | Add HD wallet auto-restore functionality by jonasschnelli · Pull Request #10240 · bitcoin/bitcoin · GitHub 19:33:38 jonasschnelli: sgtm 19:33:44 luke-jr: I see #9494 19:33:45 https://github.com/bitcoin/bitcoin/issues/9494 | Introduce an ArgsManager class encapsulating cs_args, mapArgs and mapMultiArgs by jtimon · Pull Request #9494 · bitcoin/bitcoin · GitHub 19:33:50 ok, looks like 4 days ago it was; I'll rebase multiwallet then 19:33:55 luke-jr: thank you 19:34:02 luke-jr: great. I promise to test 19:34:05 luke-jr: thank you! 19:35:02 ryanofsky: will do the 10295 review. Thanks for the info 19:35:07 short point: wrt the pruned-node-serving, see http://bitcoin.sipa.be/depths.png 19:35:11 added 10295 and 10339 19:35:22 #topic pruned-node serving 19:35:31 see that graph, the title is wrong 19:35:33 Currently overhauling the BIP 19:35:47 it shows the relative depth of each block downloaded from my node _excluding_ compact blocks 19:36:10 gmaxwell did some statistical analysis on it 19:36:38 Sipa's data is interesting. 144 is to small for sure. 1008 is fine. I'm of the view that we don't need more than a dozen or so blocks of headroom. I think the BIP should be written based on what you should keep. How you decide where to fetch depends on exactly what you're doing. 19:37:05 hm 19:37:27 I found no really evidence of a real preference for N weeks in sipas data, but rather, advantages for doing 1-day 2-day 3-day ... etc. But 'day' is a lot more than 144 blocks, because of hashrate increases. 19:38:04 You can process the data to roughly remove IBDing peers and the fall off is pretty stark. 19:38:18 note sipas graph ignores depth 0. 19:38:33 it'd be a hockeystick if it included 0 19:38:44 What would you recommend for "day" instead 144, calc in the historical hashrate increase? 19:38:53 also 0 data is inaccurate because it excludes compact blocks 19:39:18 gmaxwell: didn't you suggest 288? 19:39:20 jonasschnelli: I think we should make the first threshold 288. It's more than enough to cover a 'day' in practice. 19:39:39 288 and 1008... 19:39:59 But then the current minimum (prune=550) would not allow to signal the LOW mode? 19:40:08 the current minimum is 288 19:40:11 and then peers should estimate what they need (based on time, or headers if they have them) and choose where to connect. The estimate should be conservative but it doesn't need to be a 100 block headroom, a dozen blocks should be fine. If you get headers and find that you need more, you'll disconnect and go elsewhere. 19:40:13 Or is 288 including headroom? 19:40:24 the 550 is just so you don't set a prune limit which you have no hope of respecting 19:40:26 the minimum is 288 blocks. 19:40:30 its out of date with segwit 19:40:44 and we'll blow over the prune setting to preserve 288 blocks. 19:40:55 i think the calculation is presented in the code comments 19:41:03 Yes. 288 is the minimum. So we should remove the BIP headroom/buffer from the BIP 19:41:22 I think eventually we should be changing the prune setting to be enum-like but thats another matter. 19:41:58 jonasschnelli: I think the BIP shouldn't have any buffer. "You store X from your tip" "You store Y from your tip" it can then make advice to users on how to choose connections. but the requirement is just what you promise to store. 19:42:13 gmaxwell: ack 19:43:12 The advice can say to use the best info you have available (time or headers if you have them) to figure out what you need, and then give enough headroom maybe 6 or 12 blocks that you can fetch parents. The cost of connecting to someone that doesn't have what you need is not that great. You'll request headers from them, learn you need blocks they don't have and you'll disconnect them and connect 19:43:18 to someone else. 19:44:01 For the 1008 I guess the BIP can no longer state blocks for 1 week. Now the question is to use 2016 or say it 3.5 days.. 19:44:17 ? 19:44:35 i think it should just say 1008 or 2016 blocks or so, and not make any connection with time 19:44:44 From what I understood is that 144 is to little for a day regarding the increasing hash-rate 19:44:54 jonasschnelli: I'll catch up with you later today, I don't have my processed results in front of me. But I think I found that after elimiating IBDs there were very few fetches in sipas data past 1000 blocks deep. And indeed, it shouldn't mention time. 19:45:37 But light client implementations are really looking for "days" rather the blocks.. but, sure, they can do their homework... but would have been nice to mention day values in the BIP. 19:45:43 But maybe they are to inaccurate 19:45:47 The bit(s) should just be defined as "I claim I will keep at least X blocks deep from my tip, maybe I keep more, maybe not." 19:45:54 jonasschnelli: light clients know how many blocks they are behind after header sync 19:45:58 jonasschnelli: anyone using these bits will fetch headers. 19:46:15 Indeed.... okay. Got it. 19:46:46 now, before you connect you won't have headers and you'll need to make a time based guess. If you guess wrong you'll need to disconnect and go elsewhere. Not the end of the world. 19:47:16 Yes. I agree on that. Re-connecting should be hard. 19:47:37 Maybe even an additional dns query may be involved (in case you filter) 19:48:10 even if it happens, it'll happen just once 19:48:31 Yeah,... shouldn't be a problem for clients 19:48:34 because even if you connect to a peer that does not have enough blocks, they'll have the headers to teach you how many blocks you are behind 19:48:39 so i don't think it's such a big issue 19:49:03 done topic? 19:49:08 I think I mentioned it on the list, but it should be clear that these bits should still mean that you can serve headers for the whole chain. 19:49:33 #topic bytes_serialized (sipa) 19:49:38 thanks 19:49:42 Kill with fire (sorry wumpus) 19:49:43 gmaxwell: seems obvious.. but I'll mention it 19:49:43 :P 19:49:54 so currently gettxoutsetinfo has a field called bytes_serialized 19:50:03 which is based on some theoretical serialization of the utxo set data 19:50:09 I think there's something to be said for a neutral way of representing the utxo size, that doesn't represent on estimates of a specific database format 19:50:17 wumpus: agree with that 19:50:20 what I said to sipa the other day was that if we list the total bytes in values and the txout counts, that lets you come up with whatever kind of seralized size estimate you want. 19:50:45 but would you be fine with it just being the size of keys+values in a neutral format, _not_ accounting for the leveldb prefix compression? 19:50:51 sipa: yes 19:50:52 If you want you could multiply that count by 36 and add the values and that gives you the size for the dumbest seralization that hopefully no one would use. 19:50:52 values counted as 8 bytes, or compressed? 19:51:08 sipa: that's be fine really, and the format change provides oppertunity to change the definition 19:51:14 wumpus: agree 19:51:22 okay if wumpus and sipa agree I'll shutup. 19:51:31 luke-jr: no strong opinion. do you? 19:51:42 sipa: I don't think the compression should be exposed, ideally. 19:51:48 luke-jr: seems fair 19:51:49 wumpus: the only concern I had with a really neutral figure is that it's misleading. 19:51:51 not a strong opinion though 19:51:51 luke-jr: just a fixed size seems ok to me 19:52:01 luke-jr: that's more future proof likely 19:52:07 luke-jr: so we can have a statistic to compare over time 19:52:11 can't we output more than one thing? 19:52:27 wumpus: indeed 19:52:42 e.g. a naieve seralization would have 32 bytes for txid, but the reality is probably under 16 due to sharing. But as long as it doesn't require scanning that data I guess I don't care. 19:52:47 morcos: so #10396 reports the actual disk usage 19:52:48 https://github.com/bitcoin/bitcoin/issues/10396 | Report LevelDB estimate for chainstate size in gettxoutsetinfo by sipa · Pull Request #10396 · bitcoin/bitcoin · GitHub 19:52:53 morcos: and the total number of utxos is also reported 19:53:15 we should definitely report the actual disk usage too! 19:53:23 yeah i'm sorry if i'm behind, but i think actual disk usage is useful, even if we want this .. ok, that's all i was saying 19:53:30 agreed 19:53:30 yes yes, absolutely 19:53:44 the point is that the current bytes_serialized tries to mimick disk usage, but fails 19:53:45 the leveldb usage is a noisy thing that goes up and down based on the mood of the table compacting gods. 19:53:47 (although I guess users can just du the directory?) 19:53:50 and will fail even more post per-txout 19:54:17 so if we drop the requirement that bytes_serialized has anything to do with disk usage, all is good 19:54:25 gmaxwell: yep, it's less useful for reporting as statistics 19:54:34 sipa: indeed; I never assumed it did really 19:54:58 to me it was just 'serialization size of utxo in an arbitrary, but constant, format' 19:55:00 huh what im here 19:55:19 sipa: would make sense to rename the field too 19:55:21 wumpus: ok, so 10195 removes bytes_serialized - i'll create a separate PR afterwards to add a (new) bytes_serialized again 19:55:25 wumpus: agree 19:55:32 wumpus: it will be odd if the serialized size is larger than the database but not that odd. 19:55:47 gmaxwell: at least it will be obvious that it has nothing to do with it then! 19:55:49 (after all we don't want people to report weird jumps in statistics, renaming the field is ag ood hint) 19:56:07 sipa: maybe it should be renamed? 19:56:13 luke-jr: yes, it should be 19:56:17 "bogosize" 19:56:21 bogosize++ 19:56:22 hash_serialized is renamed too 19:56:28 hahaha bogosize 19:56:34 ok, deal 19:56:39 should be in nibbles. 19:56:42 :P 19:56:43 lol 19:56:44 :D 19:56:46 in nepers 19:56:49 buy one get one size? 19:56:55 ehats the base e entropy unit? 19:56:58 gmaxwell: yes 19:57:27 can I add an OP_CHECKBOGOSIZE? *hides* 19:57:28 Good. (that was supposted to be a "Whats?" but seems you were a step ahead of me) 19:57:39 ah, no, nats 19:57:47 nepers are just for ratios, like db 19:57:53 19:58:02 time to close the meeting I think 19:58:02 2 minutes 19:58:04 review begging? 19:58:09 :P 19:58:11 we already did that one 19:58:13 ah k 19:58:17 defer BIP148 to next week? 19:58:22 (though if you have any proposals just say so) 19:58:24 https://github.com/bitcoin/bitcoin/pull/10333 <-- my beg 19:58:33 luke-jr: oh forgot about that one 19:58:43 it's okay, a week might be good anyway 19:58:50 I'm sure you can discuss it in one minute. 19:59:00 :P 19:59:03 we need a meeting extension block 19:59:04 * morcos refrains 19:59:09 #endmeeting