Thursday February 12, 2009 The Insecure, Great, British Pædosieve: The use of web-cache software
In my previous post on the Great-British Pædowall (or sieve more accurately, given its usefulness) I mentioned potential insecurities generally. Some web-URL-blocking systems make use of the Squid Cache to implement the URL-matching-and-blocking functionality. This, I would argue, is a qualitative security risk, in at least two dimensions.
It is, I think, widely accepted that software detects are unavoidable and proportional to the complexity of a body of software. The rate of defects in software can be minimised, e.g. through certain, more costly, development practices and by restricting the complexity of the software (e.g. the amount of code, features, etc).
Squid is a long-standing software project to develop web-caching (not blocking) software. As it was designed to cache web content, it has to have a fairly sophisticated understanding of HTTP - not entirely trivial and certainly significantly more complex than applying string filters. Further, Squid can act in a distributed manner, co-operating with other caches by exchanging information with them - to this end it supports various inter-cache/proxy protocols (ICP, WCCP, etc.). Each of which, obviously, requires its own body of network exposed code. Additionally, it contains support for management protocols, both in-band (HTTP cache_object requests) and out-of-band (e.g. SNMP) which can be used to retrieve information - again requiring specific, network-exposed lumps of code.
Squid was developed via normal, open source development practices. The project originates from the 1990s, an era when the Internet was still a more trusting, less hostile place. Its coding practices reflect this to some extent: raw handling of network-supplied input abounds - this is (sadly) still very common in C/C++ network software generally (Squid is C++ now, but much of the code is imported C from the original Squid; there is little, if any, use of input-sanitising abstractions). Further, deployment in "friendly" environments is reasonably typical for Squid (e.g. corporate, where nearly all users are reasonably responsible adults who are contractually bound to be "friendly", on pain of substantial punishment) .
So a) Squid is a reasonably featureful and complex piece of software, with several relatively independent, parallel bodies of code that can be invoked by a remote, untrusted user; b) There is no reason to think any special defect-minimisation processes were used to develop Squid. In combination, this means we should expect Squid to have a fairly large number of defects that can be triggered by an external attacker - some proportion of which will have disruptive consequences, possibly even leading to the execution of network-supplied code. A glance at the various vulnerability databases on the internet would seem to confirm this suspicion: Squid is subject to fairly regular reports, against pretty much all the protocol bits mentioned above. It is reasonable to think that many defects remain and that some would be easily found with fuzz-testing.
Simply put this means a Squid cache will expose information and/or behave in ways which its operator did not anticipate. These ways may be by design - through features the operator was not fully aware of - or by defect. Behaviours that are caused solely by having put this large piece of software in the path, and could be fixed simply by removing it, without any loss of functionality (according to my argument in the earlier post).
The risks here are in two dimensions, a) Risks to the operator; b) Risks to the users. This may seem obvious, but it's an important distinction because the operator is likely to care more about the former risks than the latter. I.e. an ISP might take pre-emptive action to mitigate a risk to itself, but is less likely to expend similar effort on mitigating b. This is somewhat speculative, but reasonably safe in the face of various lost-customer-data incidents seen in other industries.
Risks to the operator:
Risks to the user:
Finally, a typical Squid transparent proxy has no way to ensure that the original IP destination address matches the requested host. Squid will only see the HTTP-layer request, and can only route that. Therefore such an ISP is, obviously, operating an open-proxy as far as its customers are concerned. Whether there are any qualitative security problems with this, I don't know, but you can gain some amount of plausible deniability by setting a known filtered host as your HTTP proxy (e.g. for browsing, or for BitTorrent) and sending your own X-Forwarded-For header with various other customer IPs. Squid will add your IP to the end of the X-Forwarded-For, but the recipient may not be able to tell which was the real one.
Recommendations:
Much of the investigation into this was done by an unnamed collaberator. Also, Richard Clayton's paper on BT Cleanfeed contains very useful information (e.g. techniques to find filtered hosts to test with). tcptraceroute and socat are very useful tools.
Updates: murb informs me that there are transparent-proxy patches for Squid and Linux. It's not clear to me whether these currently would allow Squid to match on the original destination IP though.
It appears the X-Forwarded-For should consistently point to the source behind the ISP pædo-sieve. That said, this header might not be logged by default by many systems; it's also conceivable that the code responsible for updating this header may contain exploitable defects.
( Feb 12 2009, 04:20:30 PM GMT ) Permalink Comments [0]