Alexandr Nedvedicky
Packet gets blocked eventhough it should pass
Time to time I see desperate admins seeking advice, why packet gets blocked, eventhough it should allowed in. So let me introduce typical symptoms first. An IPF admin time to time checks IPF log, which might look as follows:
22/04/2009 22:12:40.027573 bge0 @0:45 b 192.168.1.1,45661 -> 172.16.30.30,25 PR tcp len 20 52 -S IN 22/04/2009 22:12:50.159151 bge0 @0:45 b 192.168.1.1,45661 -> 172.16.30.30,25 PR tcp len 20 52 -S IN 22/04/2009 22:13:03.670300 STATE:NEW 192.168.1.1,45661 -> 172.16.30.30,25 PR tcp
As you can see there is SYN packet, which initiates SMTP connection from client (192.168.1.1) to mail server (172.16.30.30). What's really strange the packet is blocked two times (22:12:40, 22:12:50) while the third time the packet is let in (22:13:03). Very strange looks like IPF finaly makes up its mind and let packet in. To start investigate issue further we need to get a ruleset, which is used.
Let's interrupt our investigation for a while and let's examine the log entries we have. The first two lines are same (except times). So let's try to read them in human words:
- 22/04/2009 22:12:40.027573 bge0 something interesting has happened on Apr 22 2009 ... on bge0 interface.
- @0:45 this describes the rule it was rule number 45 in group 0, (root group).
- b the rule dropped packet
- 192.168.1.1,45661 -> 172.16.30.30,25 PR tcp len 20 52 -S it was a TCP SYN (-S packet coming from client to SMTP server. The total datagram length was 20 octets (len 20).
- IN It was inbound packet.
The third line tells us the packet had to match some state rule (rule with keep state. So the packets, which belong to SMTP session will be let in without inspecting rules. The presence of state entry created here (STATE NEW) is enough to let such packets in. The description of stateful filter is out of scope. Read either official docs from SUN or the ancient IPF-Howto.
Let's get back to our crazy (undeterministic) IPF. Since the log told us something strange has happened we want to investigate problem further. Good start is to get ruleset being used in IPF. Use command ipfstat -ionh to that. The command will print out all rules (inbound i-switch), outbound o-switch) with number of times the rule was hit by packets (the hit count is printed out at the beginning of line). So let's hit ipfstat -ionh command on our crazy firewall:
#ipfstat -ionh .... 305 @35 pass in log quick proto tcp from any to any port = ssh flags S/FSRPAU keep state keep frags 343112 @36 pass in log quick proto tcp from any to any port = smtp flags S/FSRPAU keep state keep frags 1110 @37 pass in log quick proto tcp from any to any port = imap flags S/FSRPAU keep state keep frags 357 @38 pass in log quick proto tcp from any to any port = imaps flags S/FSRPAU keep state keep frags 0 @39 pass in log quick proto tcp from any to any port = pop3s flags S/FSRPAU keep state keep frags ... 106230 @45 block return-rst in log quick from any to any
The rules above is a modified excerpt of some real life scenario. As you can see all SMTP packets are supposed to match inbound rule number 36. The counter At the line beginning there is a number, which says the rule was matched by 343112 packets. So what's up with the rule? Why there are at least two SMTP packets, which matched rule 45 (according to log we have)?
The first part of the answer lies in the way how IPF processes the packet through rules. The IPF uses so called last match policy, which means the verdict from the last matching rule will decide packet's fate. The IPF by default applies rules to packet rule-by-rule, once all rules are applied the last matching one is used for decission. So that's the reason why the SMTP packet matched rule 45, it's simply because the rule 45 is the last rule, which will match any packet. This leads us to a question: Is there any rule which would allow such SMTP packet in? Yes there is such rule, it is rule number 36. The keyword quick at rule 36 ensures IPF will stop rule processing and use rule 36 as a last matching rule. The question is now: why rule 36 sometimes matches packet let packet in and sometimes doesn't?
To answer this we have to read complete rule to its end. We see there is keep state keyword in its end. It basicaly tells IPF to create a an entry for packet in state table. All packets, which belong to particular SMTP session (we are still talking about rule 36), then will be allowed in upon existence of such entry. No rule look up will be performed for them. I'm not going to describe the stateful inspection here, it's just another interesting topic.
Let's stick with our problem:
- we have three SMTP SYN packets
- two of them were blocked by rule 45
- third created a state entry => it had to match rule 36
We feel the state entry must be created in order to let SMTP pacekt in. What will happen with packet if such entry can not be created? Also can IPF fail to create state entry? The answers are best answered in IPF sources. We want to check function fr_addstate(). Reading further down the code in the function we can see there is a check, which ensures fr_statemax limit is not exceeded. If fr_statemax limit is exceeded the function fr_addstate() will fail. So the answer is: Yes, IPF can fail to create state entry.
Let's look what happens next, once fr_addstate() fails. Because of presence of quick keyword in rule the fr_addstate() is called from fr_scanlist() function, which processes rules. You can see when quick and keep state are present in rule IPF will try to add state. If it fails to add state, it will bump a some kind of stat counter and will continue with rule processing. This perfectly explains why some SMTP packets are let in, while the others are blocked. If there is enough space in state table, then packet is allowed in. If there is no space to add a state entry the SMTP packet will be dropped in our case, because it will match rule 45 inevitably.
So far so good, but what's presented here is just assumption, is there a way IPF admin can verify such speculation? Yes admin can verify it's happening, however it's not a straightforward process. The commands ipfstat and ipf are admin's friends.
The first step to do once there is a suspicion the weired things are happening with state rules is to use ipfstat command:
#ipfstat
bad packets: in 0 out 0
IPv6 packets: in 0 out 0
input packets: blocked 118268 passed 37438678 nomatch 1875 counted 0 short 1
output packets: blocked 0 passed 41840335 nomatch 15501434 counted 0 short 0
input packets logged: blocked 118268 passed 37436802
output packets logged: blocked 0 passed 26338293
packets logged: input 0 output 0
log failures: input 11030071 output 12036919
fragment state(in): kept 0 lost 0 not fragmented 0
fragment state(out): kept 0 lost 0 not fragmented 0
packet state(in): kept 275972 lost 69024
packet state(out): kept 395327 lost 18498
ICMP replies: 106371 TCP RSTs sent: 11824
Invalid source(in): 0
Result cache hits(in): 0 (out): 0
IN Pullups succeeded: 117451 failed: 0
OUT Pullups succeeded: 54662 failed: 0
Fastroute successes: 118195 failures: 0
TCP cksum fails(in): 0 (out): 0
IPF Ticks: 1261270
Packet log flags set: (0)
none
We are mostly interested in these two lines from the ipfstat's output above:
packet state(in): kept 275972 lost 69024 packet state(out): kept 395327 lost 18498This tell us there were 275972 states created for inbound packets in total, while ipf failed to create state for 69024 inbound packets. The second line are the same stats for outbound packets. The lost counter reports a bad counter value, which is bumped anytime fr_addstate() fails to add state entry.
At this moment we are pretty confident IPF failed to insert a state entry into state table. Is there a way to find how many entries are present in table? To tell how many entries are present in table one has to ask ipfstat command in more polite way: use a -s switch:
#ipfstat -s
IP states added:
545103 TCP
126221 UDP
0 ICMP
5704288 hits
73840026 misses
87310 maximum
0 no memory
4013 active
0 expired
0 closed
State logging enabled
State table bucket statistics:
16 in use
0 max bucket
0.28% bucket usage
0 minimal length
1 maximal length
1.000 average length
The most interesting counters are:
- maximum which tells how many times IPF hit the fr_statemax barrier
- active which tells how many entries are present in state table
fr_statemax min 0x1 max 0x7fffffff current 4013 fr_statesize min 0x1 max 0x7fffffff current 5737 fr_state_lock min 0 max 0x1 current 0 fr_state_maxbucket min 0x1 max 0x7fffffff current 26 fr_state_maxbucket_reset min 0 max 0x1 current 1 ipstate_logging min 0 max 0x1 current 1 state_flush_level_hi min 0x1 max 0x64 current 95 state_flush_level_lo min 0x1 max 0x64 current 75The most interesting variable is fr_statemax, which defines a hard limit, which can not be exceeded. This is the value, which is being checked anytime new entry is being inserted into table. So once admin will find out the active counter (number of state entries used) and fr_statemax are geting close to each other, it's the right time to increase table size. In the case presented here it's too late, too many states were lost.
There is another blog entry, which dicusses how state table is implemented, it provides a kind of guidline to tune the table.
The issue described here was brought to my attention quite recently after a few months of relative calm. I have not heared about such problems for a long time until this April. The internal case report looked as follows:
After customer updated from 138888-03 to 138888-07 (plus other recommended patches), he notices that IPF sometimes "blocks" connections which should pass; e.g. mail from external. - IPF was running fine with 138888-03 (no false blocks). - It takes some days till the probleme starts to occur. - When ipf starts "blocking" customer has to restart ipf to make it "pass" again.I still own the answer why IPF got so crazy and started to block packets once it's been updated to 138888-07. Sometimes a good things have a bad impact. We all agree a bugfixing is good. Unfortunately fixing CR 6566976 had a bad impact. The fix of CR 6566976 is responsible for this kind of surprise, since by fix of CR 6566976 IPF started to enforce fr_statemax limit effectively.
Posted at 06:01PM Jul 16, 2009 by Alexandr Nedvedicky in Sun | Comments[3]
This is a very good description.
Posted by Alfred Vogelbacher on July 17, 2009 at 08:26 AM CEST #
Does it mean, that every Solaris machine, which uses ipfilter with 'keep state' rules is vunerable wrt. DoS attacks opening a lot of connections to the port[s] in question ? (wondering how iptables (Linux) handles those problems) ...
Posted by 87.188.94.48 on July 17, 2009 at 11:05 AM CEST #
Trying to answer the first part of the question - is Solaris IPF vulnerable? The answer is Yes and No. The vulnerability aspect solely depends on the configured size of hashtable. The bigger hashtable, the smaller chance the packets will be dropped, because of lack of space in state table.
I have not studied the design of iptables deeply enough to answer question above. But I can bet there certainly is limit for number of state entries in table. It's up to admin to tune this limit to meet network needs. In general the higher bandwith/larger net protected by firewall the larger table must be.
The response to DOS attack Alfred mentions depends on deployment scenario. The response is basically determined by these factors:
is IPF running as host based firewall?
is attack coming from LAN?
is attack coming from internet?
but basically the only countermeasure is to cut off the source of DOS attack, before packets will reach firewall, which sounds odd, but that's exactly how you can cope with it, i.e. if attack is coming from internet then you usually contact your ISP to help you to sort out the situation. What ISP usually does is, it will apply some temporal policy to upstream routers filter out suspicious packets.
From the perspective of IPF's admin one of the possible options is to shorten timeouts for state table entries. It will allow IPF to free up memory more quickly, on the other hand it might hurt users, who are contacting remote servers with high latency. It's always trade off.
Posted by SashaN on July 21, 2009 at 01:26 AM CEST #