-
Notifications
You must be signed in to change notification settings - Fork 2.2k
discovery+lnwire: add support for DNS host name in NodeAnnouncement msg #9455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
discovery+lnwire: add support for DNS host name in NodeAnnouncement msg #9455
Conversation
Important Review skippedAuto reviews are limited to specific labels. 🏷️ Labels to auto review (1)
Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
3e93cdf
to
cb36d26
Compare
fb721bc
to
4a8adcb
Compare
c3a6a49
to
d67c686
Compare
3efa4b2
to
01c57c8
Compare
|
d48d1fd
to
98d2f2f
Compare
Thank you, @ellemouton, for your feedback on this PR. I have restructured the commits and added unit tests for encoding/decoding the DNS hostname address, as well as for parsing it. Apologies for the significant delay between receiving your review and addressing it—I’ll do my best to minimize such gaps in the future, in line with the Any follow-up feedback would be very much appreciated! PS: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR :)
I think its a good idea to test/ensure the following:
- make sure the node can handle providing its own DNS addr
- make sure the node can handle receiving a DNS addr from other nodes
The main point is that an IP address like For example, using a library such as miekg/dns with code like: dns.IsDomainName("500.0.0.0") returns This can feel odd in certain cases, such as when a user updates their announcement with that address and sees it added. For example: lncli peers updatenodeannouncement --address_add 500.0.0.0 This might seem strange to the user like when I opened that Issue #9816 for the changes this PR introduce. So, it’s unclear whether this should be considered a feature or a bug 😅. However, based on both DNS standard rules and Bolt-07 rules, this is objectively valid since the hostname consists of ASCII characters within the valid length limit. EDIT: |
80b75d8
to
cca9ffc
Compare
Hey @ellemouton, I’ve addressed the feedback and added a few follow-up questions for clarification. Would you be able to take another look when you have some time? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice work @mohamedawnallah !
Left some comments.
I did some testing of this (after first rebasing on #9856) and can confirm it works nicely!
From the testing I can see that: if a node has a node announcement stored that has a DNS addr inside and then starts up, we still cant see it and still see unknown network for unrecognized address type
. However, this then gets fixed as soon as we get an updated node address from the node in question since then we go overwrite all its old addresses and properly use the new type.
So I think we could actually get away with doing the migration in a separate PR to this one since I think it is not critical to migrate this data since things will be consistent eventually anways if the node announcement is received again. All it means is that if we were to ship with only this change sans the migration then if a node has a DNS addr persisted of another node, it wont be able to properly read/see that until it receives a new node announcement to override that.
I think it would be good to get a second opinion on this point :)
@@ -31,6 +31,9 @@ const ( | |||
// opaqueAddrs denotes an address (or a set of addresses) that LND was | |||
// not able to parse since LND is not yet aware of the address type. | |||
opaqueAddrs addressType = 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that we cannot implement the DB logic after the lnwire logic. we need to do it the other way around unless we are explicitly handing the new type. Why? cause we always have to be ready to revert to any commit & should not assume that these commits are all-or-nothing.
If we currently only have the lnwire logic before this commit then we'd hit ErrUnknownAddressType
at persistence time. do you see why?
approach 1
If we did want to handle this on an lnwire level and not yet at a persistence level then we'd need to be smart and convert any DNS
addrs to OpaqueAddr
type (or potentially edit any existing opaque addrs if there already exist some) at the time of converting lnwire.NodeAnn
to models.LightningNode
. Then, at the time of converting from models.LightningNode
to lnwire.NodeAnn
, we'd need to do some introspection of any OpaqueAddr to see if it potentially starts with a DNSAddr that we should extract.
what is nice about this approach, is that then we can fully support this on the lnwire
level in 1 PR without touching the DB! Then we can worry about any DB migrations afterwards (ie, we can split this into 2 PRs)
approach 2
If we did immediately want to support the new addr being persisted under a new type then we'd need to do things as follows:
- first add all the new DB read/write logic for the new DB type and write a migration that looks through all current nodes with addresses under the
opaqueAddrs
db type, potentially extract the DNS addrs from there, then re-write them as separate address types. we'd still convert any DNSAddrs to Opaque type at the time of converting from models.LightningNode -> lnwire.NodeAnn. - then we can add DNSAddr types to the lnwire.NodeAnn struct and add all the encoding/decoding logic there.
This approach would mean doing it all in 1 PR.
graph/db/addr.go
Outdated
opaqueAddrs addressType = 4 | ||
|
||
// dnsHostnameAddr denotes a DNS hostname address. | ||
dnsHostnameAddr addressType = 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stored as type 4 would incorrectly end up in the opaque address bucket instead of being properly recognized as DNS addresses (type 5). Is my understanding correct?
note: be careful not to confuse lnwire
level types with DB types here. When you mention type 5 - this is the protocol level type for storing DNS addrs in the node_ann message.
When you say type 4 - here we are talking about a DB level address type. Currently we write any unknown address bytes into this type 4 bucket and we call it an "opaqueAddress" which might under the hood be many addresses (up until now we would not know cause we didnt know how to parse it).
graph/db/addr.go
Outdated
opaqueAddrs addressType = 4 | ||
|
||
// dnsHostnameAddr denotes a DNS hostname address. | ||
dnsHostnameAddr addressType = 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
n our case, without migration, there would be parsing errors since DNS addresses currently stored as type 4 would incorrectly end up in the opaque address bucket instead of being properly recognized as DNS addresses (type 5). Is my understanding correct?
hmm no i dont think there will be parsing errors now that i think about it. It's about inconsistencies. since right now we might have node's with dns addrs persisted but we would have those under the opaque type & so at read time, we'd keep them as OpaqueAddrs instead of properly having them under our new DNSAddr type... but perhaps that isnt actually a problem right now - as long as the original lnwire.NodeAnn
still looks the same once serialised.
Ideally ideally we'd like consistency, so we'd. want to iterate over all opque addrs, see if they maybe contain a DNS type, extract that and then re-persist it under the new dns type and then re-persist any remaining opaque bytes under the opaque type.
graph/db/addr.go
Outdated
opaqueAddrs addressType = 4 | ||
|
||
// dnsHostnameAddr denotes a DNS hostname address. | ||
dnsHostnameAddr addressType = 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Check addresses for any DNS addresses incorrectly stored as type 4
// Multiple DNS addresses need to be handled?
for i, addr := range node.Addresses {
// Check if this is a DNS address incorrectly stored as type 4
if addr.Type == 4 && isDNSAddress(addr) {
// Update to type 5 (DNS hostname address type)
node.Addresses[i].Type = 5
needsUpdate = true
}
}
almost but not quite. Remember that an opaque Addr can potentially contain multiple addrs. So it would be a matter of trying to extract a DNS addr from it. THis might still leave over other bytes that are still considered Opaque (example: we dont support Websocket addresses yet so it might contain those).
graph/db/addr.go
Outdated
opaqueAddrs addressType = 4 | ||
|
||
// dnsHostnameAddr denotes a DNS hostname address. | ||
dnsHostnameAddr addressType = 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So would this migration script be located in something like channeldb/migration34 And from a user perspective, they would typically learn about this migration through the release notes?
yes and yes
fe9a5e3
to
f8584dd
Compare
Thanks @ellemouton for the thoughtful feedback in the second review round. I've addressed your feedback and added a few comments where further clarification was needed. I haven't included the migration script in this PR yet may be in a follow-up PR as you suggested |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love how much test-oriented this PR is! My main question is about the usage of the DNS - from a user's perspective, do we allow more than one DNS to be saved on disk? And how can a user update its DNS record?
} | ||
|
||
// Check if the hostname resembles an IPv4 address. | ||
if IsIPv4Like(d.Hostname) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can use if net.ParseIP(host) != nil {
instead, to rule out both v4 and v6.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will rule out the DNS hostname as well since net.ParseIP("example.com")
is nil. This comment is also related #9455 (comment)
} | ||
|
||
// Split hostname into labels and validate each. | ||
labels := strings.Split(d.Hostname, ".") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should error out when len(labels) == 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So currently len(labels) == 0
can't happen since we check at the beginning if hostname is empty:
// Ensure the hostname is not empty.
if len(d.Hostname) == 0 {
return errors.New("hostname cannot be empty")
}
and even if Hostname
is empty it doesn't return an empty array:
gore> strings.Split("", ".")
[]string{
"",
}
lnwire/dns_addr.go
Outdated
// symbols, is invalid. BOLT 07 enforces this rule to ensure | ||
// node addresses are compatible with DNS standards. | ||
for _, ch := range label { | ||
if !((ch >= 'a' && ch <= 'z') || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's unicode.IsLetter
and unicode.IsDigit
that can be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use unicode.IsDigit
but we can't use unicode.IsLetter
since it includes non-ascii characters e.g:
fmt.Println(unicode.IsLetter('A')) // true
fmt.Println(unicode.IsLetter('Ω')) // true
fmt.Println(unicode.IsLetter('你')) // true
fmt.Println(unicode.IsLetter('ل')) // true
return netCfg.ResolveTCPAddr("tcp", hostPort) | ||
// For loopback or IP addresses: Use ResolveTCPAddr to properly | ||
// resolve these through Tor or other proxies, preventing IP leakage. | ||
if lncfg.IsLoopback(host) || isIP(host) || lnwire.IsIPv4Like(host) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: net.ParseIP
should be enough? why isIP
and then IsIPv4Like
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
net.ParseIP
returns either a non-nil or nil value:
- If it's non-nil, the input is a valid IPv4 or IPv6 address.
- If it's nil, the input could be either a hostname or an invalid IP address.
The problem is that net.ParseIP
doesn't distinguish between hostnames and invalid IPs — both fall into the nil category. So, an address like 500.0.0.0
(which looks like an IP but isn't valid) would be treated as a syntactically valid DNS hostname, which is misleading.
That's why isIP(host)
is called first — to short-circuit known valid IPs.
If that returns false (meaning it's either an invalid IP or a hostname), we then use IsIPv4Like(host)
to check if it's something that looks like an IPv4 address but isn't valid (e.g., 500.0.0.0
). Then we parse as an IP and eventually it would resolution errors later, such as when calling netCfg.ResolveTCPAddr("tcp", hostPort)
.
Finally, if it’s neither an IPv4/IPv6 (valid/invalid), we treat it as a DNS hostname.
Overall the goal of isIP
and then IsIPv4Like
and not simply net.ParseIP
is to make sure we don't accept addresses like 500.0.0.0
as DNS. Although IPv4 like addresses will be rejected here anyway DNSAddr.validate
I thought when user adds something like this they typically adding IPv4 address so it is better to be handled by netCfg.ResolveTCPAddr
.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If 500.0.0.0
is a DSN that will fail the resolution in the future, why do we need to know if it's "like an IPv4"? I think with this function, we're introducing some unclear and ambiguous logic, bc I need to stop, go down to the function, and learn what "like an IP" means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to know if it's "like an IPv4"
To avoid introducing issues like this in future #9816? Not 100% sure if that is considered to be a real concern whether we get response added even for invalid IP addresses
My understanding is that it would be weird to accept adding that address in the node announcement since it is invalid IP address but valid DNS address
$ lncli --network=regtest peers updatenodeannouncement --address_add 500.0.0.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to know if it's "like an IPv4"
To avoid introducing issues like this in future #9816? Not 100% sure if that is considered to be a real concern whether we get response added even for invalid IP addresses
My understanding is that it would be weird to accept adding that address in the node announcement since it is invalid IP address but valid DNS address
$ lncli --network=regtest peers updatenodeannouncement --address_add 500.0.0.0
What do you think about this @GustavoStingelin or my comment was still unclear why we need IPLike
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think eliminating that IPLike
will help present clarity to the code so I think that is a little tradeoff to take. I am gonna remove it and let that lookup fails at runtime when needed as you said 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we could always call netCfg.ResolveTCPAddr after confirming the address is not an Onion address. This function handles DNS resolution when the provided address is not a valid IP and returns an error if the resolution fails. With this approach, parseAddr will return an error when appropriate, preventing the server from starting with an invalid address. What do you think of this approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we could always call netCfg.ResolveTCPAddr after confirming the address is not an Onion address
I think Elle previously suggested that here (#9455 (comment)). My response was that caching DNS resolution results and resolving DNS to IP addresses at the time of calling netCfg.ResolveTCPAddr
could lead to issues, since these IP addresses might be behind load balancers and could change over time. For more context, please see my detailed response to their question. What are your thoughts on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. My idea is to use the resolver only to check if the DNS is valid, without resolving it to actual IP addresses. The algorithm would look like this:
if isOnion {
return OnionAddr
}
if isIPv4OrIPv6 {
return TCPAddr
}
// It's a DNS. We create a new DNS struct, but first, we validate if the DNS is reachable
if err := netCfg.ResolveTCPAddr(...); err != nil {
return err // DNS resolution failed, so we return an error
}
// Other validations for local IPs could go here, but adding them might overload the function with too many responsibilities
// DNS is valid
return DNSAddr
but on the other hand, we have the function lncfg.ParseAddressString that does similar logic, working on #8801 I feel that perhaps we should refactor both functions to improve consistency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My idea is to use the resolver only to check if the DNS is valid, without resolving it to actual IP addresses
check if the ip address behind given address
Okay I identified a gap in my understanding regarding ResolveTCPAddr
. Initially, I thought it checked service availability by verifying if the IP behind a given address was responsive (e.g., a ping/pong mechanism). However, this isn't correct. In reality, ResolveTCPAddr
simply resolves the domain name by looking up DNS records:
- A records for IPv4 addresses
- AAAA records for IPv6 addresses
Clarifying this simplifies much of the DNS logic code you pointed out, and that Elle and YY previously discussed.
"multiple DNS addresses. See " + | ||
"Bolt 07") | ||
} | ||
dnsAddrIncluded = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's important to distinguish that the node cannot announce more than one DNS vs the node cannot save more than one DNS - the restriction one the announcement should happen in the gossip, not here?
Even if we want to restrict only one DNS to be persisted on disk, I don't think this approach works since after a restart the user can save another DNS record again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after a restart the user can save another DNS record again.
Great point!
the restriction one the announcement should happen in the gossip, not here?
Will check the correct place for adding that one DNS net address constraint in the gossip 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the restriction one the announcement should happen in the gossip, not here?
not sure i agree: the lnwire
package defines valid protocol level messages and it is invalid for a node_announcement message to contain more than 1 DNS addr & so i think it makes sense to fail at decode time if this is violated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this approach works since after a restart the user can save another DNS record again.
Not sure i follow this - can you expand?
f8584dd
to
015c776
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we allow more than one DNS to be saved on disk?
That is a good observation. Currently when node restarts the user can save more than one DNS technically to the disk this need to be handled somehow on the gossip layer as you mentioned
how can a user update its DNS record?
They could do replace_operation by doing remove_operation + add_operation on the updatenodeannouncement
level so if they have example.com
and wanna add example.org
they could do:
dev@dev:~$ lncli --network=regtest peers updatenodeannouncement --address_remove example.com
{
"ops": [
{
"entity": "addresses",
"actions": [
"example.com:9735 removed"
]
}
]
}
dev@dev:~$ lncli --network=regtest peers updatenodeannouncement --address_add example.org
{
"ops": [
{
"entity": "addresses",
"actions": [
"example.org:9735 added"
]
}
]
}
// Lightning Network's BOLT 07 specification requirements. It ensures the | ||
// hostname is not empty, doesn't exceed 255 characters, contains only ASCII | ||
// characters, and that the port number is within the valid TCP range (1-65535). | ||
func (d *DNSAddr) validate() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i still dont think we need all this validation on this layer. I think it's ok to do all the validation for our own users hostname string at config load time or on the update-nod-ann rpc functionality. But the spec only says "The hostname must be ASCII and not longer than 258 chars" so imo that's the only validation to do here.
wdyt @yyforyongyu ?
"multiple DNS addresses. See " + | ||
"Bolt 07") | ||
} | ||
dnsAddrIncluded = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the restriction one the announcement should happen in the gossip, not here?
not sure i agree: the lnwire
package defines valid protocol level messages and it is invalid for a node_announcement message to contain more than 1 DNS addr & so i think it makes sense to fail at decode time if this is violated.
"multiple DNS addresses. See " + | ||
"Bolt 07") | ||
} | ||
dnsAddrIncluded = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this approach works since after a restart the user can save another DNS record again.
Not sure i follow this - can you expand?
} | ||
|
||
length := 1 + uint16(hostnameLen) + portLen | ||
addrBytesRead += length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also DNS doesn't have mapping here since it has variable hostname
note that in the code snippet i gave, AddrLen
was adapted to take in a param to account for the variable length.
perform DNS validation in a centralized pla
The spec only says we need to check for valid ascii. The other checks are not spec related.
s the difference between suggested one and existing
There are various other simplifications. Note that there is great merit in keeping the code style the same as the other cases in this switch as it makes the code more readable.
// representation. It writes the address type, hostname length, hostname, and | ||
// port (in big-endian order) to the writer. The function validates that the | ||
// DNS address. Returns an error if any part of the serialization fails. | ||
func encodeDNSAddr(w io.Writer, addr *lnwire.DNSAddr) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as discussed offline & in previous review comment: it is going to be much simpler to not touch the persistence layer. Rather just dynamically extract DNS addrs at read time from the existing opaque type & similarly at write time, add any DNS types to the opaque stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because also, note that as it currently stands: any existing DNS addrs that we have stored already will currently be under the opaque addr type - so you have to account for those at read-time anyways.
// dnsAddr denotes a DNS hostname address. | ||
dnsAddr addressType = 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: this is unused in this commit
@mohamedawnallah, remember to re-request review from reviewers when ready |
015c776
to
90e2116
Compare
I now switched my full attention to this PR. It is not yet ready for review. I am putting together a design doc scoping the changes in this PR |
Change Description
Closes #6337.
Closes #9126.
Steps to Test
Steps for reviewers to follow to test the change.
Pull Request Checklist
Testing
Code Style and Documentation
[skip ci]
in the commit message for small changes.📝 Please see our Contribution Guidelines for further guidance.