It have opened 3000 new connections per second, exhausting connection table in less a minute. Further more, customer specifically demanded (like a year ago) to have opening "src server, any, any", therefore all traffic from that server was permitted in rulebase and written to connection table. So, customer was right, we were dropping traffic without any log.įrom output was found, that source server is a testing server with product called Nessus, scanning every single tcp/udp port on every server behind that firewall. Then around time reported a real havoc started, stream of: It is an internal firewall, occasionally rulebase drop, otherwise very quiet. So I took strong coffee around midnight and run manually " fw ctl zdebug -vs 10 drop", just only watching, how text goes. So far I had no time to review why, also we did not clearly figure out, why debug was switch itself off after few hours of running, even if there was no rulebase update, no IPS update and so on. We run on that weekend also: " nohup fw ctl zdebug -vs 10 drop | grep -line-buffered '20.30.40.50|10.20.30.40' | tee /var/log/fw_ctl_zdebug_drop_LOG.txt" which came empty. Even support case has been opened, believed, that we missed something. CPU history did show also minor increase, but in nothing to worry about. As it is VSX, history is available for main gateway, not for particular virtual system (or I am wrong?), the only suspicious was sudden increase of interrupts, those were evenly distributed among all cores, so we assumed that no elephant session occurred. Then for minute was traffic visible, again drops, again visible. Surprisingly it took many seconds, 30 or more. There was clearly visible, some connections were not able get through, re-transmissions, pings with missing reply and so on. Nohup tcpdump -s 96 -w /var/log/TCPDUMParp -C 200 -W 100 -Z root -i bond1.456 arp Due to amount of affected servers, we picked up combination source and destination, and for few weekends we setup tcpdumps: They appeared "business as usual", no unexpected drops, nothing else suspicious. Having such info with staggering recurrence, we did check first traffic logs. Later on customer provided times, when connection has been lost (exact date omitted): ![]() We are still running VSX R77.30 with latest hotfix, no IPS blade running. Maybe in the future somebody would encounter similar problem, so I will try to make the story longer.Ĭustomer was complaining, that sometimes only during weekends is loosing connections towards servers. any tips what I might setup to see reason for such really strange behavior? Anything, which I can setup and analyze after? My best guess is, that even Check point support is puzzled by this, I have spoken with them, provided dumps and maybe all available logs. Traffic is permitted in rulebase, other time it works flawlessly. During that time, no backup is running, no policy update, routing is static and rather simple (default + directly connected), no drops in traffic logs, CPU 15% max. This is randomly visible in dumps many times from midnight to 2AM and then. ![]() Such behavior is for 10 or even 30 seconds, many timed during two hours time window. We are observing clearly retransmissions, so packet arrives to external, but it is not forwarded. ![]() And really, we are able to identify, that on outside interface in given time we are observing packets, which are not visible on internal. Initially hard to believe for weekend we have switched off fwaccel and setup tcpdump on internal and external interfaces on one of virtual system. To make story short, a customer started to complain, that every Sunday and Monday around 0:10AM is observing on monitoring traffic drop. I just run out of options and even opened support case, but so far no luck.
0 Comments
Leave a Reply. |