Alexandr Nedvedicky
RST for loopback
Problem statement
IPF is able to intercept packets on loopback interface (lo0) since PF-Hooks
were introduced. Unfortunately due to CR
6747420 users could make IPF to eat (block) packet only. The rules
supposed to respond with RST/ICMP such as:
Problem analysis
Let's look how the rule above would act for packet coming in over a physical network interface (NIC).
- Packet is intercepted and delivered to IPF via NH_PHYSICAL_IN packet filtering hook.
- The hook (packet event) processing in IPF happens in fr_check() function.
- fr_check()
calls fr_firewall()
to find a matching rule. fr_firewall()
finds matching rule for packet and tells fr_check() what to do with packet:
- Packet is O.K. and should be forwarded to IP stack as is,
- Packet should be blocked (eaten) silently,
- Packet should be eaten and packet sender should be notified either by RST packet or ICMP UNREACH message.
- Since we are dealing with a block return-rst kind of rule, the fr_check() function will have to block packet and send RST packet as response to the packet sender. fr_check() will use a fr_send_reset() to send RST response packet.
- The fr_send_reset() constructs a brand new packet and uses net_inject() function from netinfo module to push RST response packet to network. Among parameters required by net_inject() there is a NIC index number, which indentifies a particular physical NIC to be used to send the RST packet. Please, keep that important detail in mind, while you'll continue reading further.
- Once RST packet is sent the fr_check() will just eat/block/release (use whatever you want) the offending packet, which triggered rule. It finaly returns to caller, packet event is processed then.
The same instance of fr_check() is invoked for packet intercepted on loopback interface. In this case everything works fine up to net_inject(). net_inject() really needs a physical NIC to send a RST packet. The catch is there is no physical NIC for packet intercepted on loopback. net_inject() is not able to send RST response to loopback.
Problem impact
Not sending RST packet will cause the TCP loopback clients time out. If there is no response to initial SYN packet, then client will keep sending those SYNs for certain time out (usually 30 secs. of eternity). Once connection timeout elapses the client will report error with ETIMEDOUT code.
On the other hand if there would be RST packet sent to loopback client as a response to SYN packet, then there wouldn't be any 30 secs. delay, client would return immediately with ECONNREFUSED.
Possible solutions
Since both loopback client and IPF are running on the same box, we need to find a way for IPF to let TCP client know the connection was reset. Unfortunately there is no such way right now, which would allow IPF to pass a 'return value' to client's connect() call. Even if the fr_check()'s return value would be processed by PF-Hooks module, then there is another obstacle imposed by ip_input(), which stuffs packets to hook. The ip_input() returns void so even if there would be a return value propagated from fr_check(), it will disappear once code leaves ip_input().
Another option is to introduce a new function to netinfo module, which would allow IPF to send ICMP/RST packets even to local clients. This is probably clean from the design perspective point of view, but the overall code change would be too complex for such small bugfix.
The third option is to try to turn the the actual packet into response packet (RST/ICMP) and forward it to IP stack. This looks like a simple fix, which might work.
Turning packet into response
The first trial proves this will work. The only question is, why it does work for loopback? Let's see sources, how loopback traffic is handled by IP module. Once use launches command telnet localhost 22 in order to connect to localhost:22, the connect(3SOCKET) socket API calls kernel connect. Many things start to happen then in kernel. Nevertheless what's going on the connect() the telnet localhost 22 ends up in ip_wput_ire() function. This is a pretty long spagheti kind of code here right now (snv_116). The mp mblk_t message is being checked back and forth there to find out a way to process it further. Since our mp 'comes' from telnet localhost 22 the function is driven to nullstq: line, where loopback mp is processed.
Both branches are doing same things with outbound mp:
- They invoke PF_HOOK loopback-in event
- And if packet is not eaten by IPF (firewall) in event, they will consequently pass the mp to ip_wput_local. Once mp enters ip_wput_local() it's considered as a inbound packet.
The very first thing ip_wput_local() does at its begining, is it stuffs the mp into loopback-in event. Event deliveres packet into IPF (firewall in general) again. This time is packet seen as inbound.
If the inbound packet (mp) is not eaten by IPF (firewall) it is processed further down the ip_wput_local(). Since we are dealing with SYN packet, function will fan out packet to TCP queue.
The ip_fanout_tcp() will try to find a matching endpoint for given TCP packet. The endpoint in general can be a matching conn_t record bound to a socket in one of these states:- listen
- SYN sent
- established
- source and destination address
- source and destination port
- (sequence numbers)
- the TCP will answer with SYN ACK and notify the listening application on port 22 about new incoming connection, if there is matching listen socket
- the TCP will answer with RST if there is no socket listening on port 22.
Just to briefly recap: the IPF will see for SYN packet two events:
- loopback-out
- loopback-in
The proposed fix is going to alter the SYN packet, it will turn it into a RST packet. It's very simple:
- source port and destination ports are swapped
- sequence number is turned into an acknowledgment number
- TCP flags are set
It's just my design choice to limit this fix for inbound event only. It
matches 95% of usecases since most people prefere to use rules:
How does it actualy works?
Geting pieces together makes picture complete. The telnet localhost 22 sends a SYN packet, which is later intercepted by loopback-in event. Event passes the packet to fr_check(). fr_check() finds out the packet must be turned into RST and turns packet into RST response. Then loopback-in event completes.
Once FW_HOOK() in ip_wput_local() returns the stack no longer handles SYN packet, but the RST response to the SYN packet. The RST response will be fan out to TCP.
The tcp_fan_out() will get the RST in mp. It will find a matching connection record bound to socket owned by telnet localhost 22 process. It will pass RST packet there. This will trigger an event, which notify application no one listens. The event will resume connect() socket call, which will return ECONNREFUSED error to application. So that's the magic trick.
Similar approach is implemented for UDP, the UDP packets are turned into ICMP port unreachable messages.
Side effects
The only side effect introduced by this fix can be observed by snoop sniffing on lo0. Once return-rst rule is being used the snoop will see RST response packet only. It won't see the actual SYN packet, which provoked the RST response.
Limitations
The fix is able to deal with SYNFINpackets only. These packets are always routed through IP stack. Once loopback connection becomes established the OS might fuse such connection. Once connection becomes fused the data (packets) will no longer reach the IP level, they will be just exchanged between sockets without looking up to the connection tables. Therefore the fix explicitly checks the packets are either SYN or FIN>, since these are always unfused. More information about TCP fusion can be found in sources.
Posted at 03:30PM Jun 16, 2009 by Alexandr Nedvedicky in Personal | Comments[2]
Loopback switch to include out of the box genomic computing capabilities.
Posted by ss on June 16, 2009 at 08:38 PM CEST #
Great post and draw. Thank you for sharing.
Posted by links london jewelry on December 01, 2009 at 03:26 AM CET #