Random ramblings

All | General | Motorbikes | Networking | Odds & Sods | Soapbox | Solaris
Main | Next page »
20090212 Thursday February 12, 2009

The Insecure, Great, British Pædosieve: The use of web-cache software

In my previous post on the Great-British Pædowall (or sieve more accurately, given its usefulness) I mentioned potential insecurities generally. Some web-URL-blocking systems make use of the Squid Cache to implement the URL-matching-and-blocking functionality. This, I would argue, is a qualitative security risk, in at least two dimensions.

It is, I think, widely accepted that software detects are unavoidable and proportional to the complexity of a body of software. The rate of defects in software can be minimised, e.g. through certain, more costly, development practices and by restricting the complexity of the software (e.g. the amount of code, features, etc).

Squid is a long-standing software project to develop web-caching (not blocking) software. As it was designed to cache web content, it has to have a fairly sophisticated understanding of HTTP - not entirely trivial and certainly significantly more complex than applying string filters. Further, Squid can act in a distributed manner, co-operating with other caches by exchanging information with them - to this end it supports various inter-cache/proxy protocols (ICP, WCCP, etc.). Each of which, obviously, requires its own body of network exposed code. Additionally, it contains support for management protocols, both in-band (HTTP cache_object requests) and out-of-band (e.g. SNMP) which can be used to retrieve information - again requiring specific, network-exposed lumps of code.

Squid was developed via normal, open source development practices. The project originates from the 1990s, an era when the Internet was still a more trusting, less hostile place. Its coding practices reflect this to some extent: raw handling of network-supplied input abounds - this is (sadly) still very common in C/C++ network software generally (Squid is C++ now, but much of the code is imported C from the original Squid; there is little, if any, use of input-sanitising abstractions). Further, deployment in "friendly" environments is reasonably typical for Squid (e.g. corporate, where nearly all users are reasonably responsible adults who are contractually bound to be "friendly", on pain of substantial punishment) .

So a) Squid is a reasonably featureful and complex piece of software, with several relatively independent, parallel bodies of code that can be invoked by a remote, untrusted user; b) There is no reason to think any special defect-minimisation processes were used to develop Squid. In combination, this means we should expect Squid to have a fairly large number of defects that can be triggered by an external attacker - some proportion of which will have disruptive consequences, possibly even leading to the execution of network-supplied code. A glance at the various vulnerability databases on the internet would seem to confirm this suspicion: Squid is subject to fairly regular reports, against pretty much all the protocol bits mentioned above. It is reasonable to think that many defects remain and that some would be easily found with fuzz-testing.

Simply put this means a Squid cache will expose information and/or behave in ways which its operator did not anticipate. These ways may be by design - through features the operator was not fully aware of - or by defect. Behaviours that are caused solely by having put this large piece of software in the path, and could be fixed simply by removing it, without any loss of functionality (according to my argument in the earlier post).

The risks here are in two dimensions, a) Risks to the operator; b) Risks to the users. This may seem obvious, but it's an important distinction because the operator is likely to care more about the former risks than the latter. I.e. an ISP might take pre-emptive action to mitigate a risk to itself, but is less likely to expend similar effort on mitigating b. This is somewhat speculative, but reasonably safe in the face of various lost-customer-data incidents seen in other industries.

Risks to the operator:

Risks to the user:

Finally, a typical Squid transparent proxy has no way to ensure that the original IP destination address matches the requested host. Squid will only see the HTTP-layer request, and can only route that. Therefore such an ISP is, obviously, operating an open-proxy as far as its customers are concerned. Whether there are any qualitative security problems with this, I don't know, but you can gain some amount of plausible deniability by setting a known filtered host as your HTTP proxy (e.g. for browsing, or for BitTorrent) and sending your own X-Forwarded-For header with various other customer IPs. Squid will add your IP to the end of the X-Forwarded-For, but the recipient may not be able to tell which was the real one.

Recommendations:

Much of the investigation into this was done by an unnamed collaberator. Also, Richard Clayton's paper on BT Cleanfeed contains very useful information (e.g. techniques to find filtered hosts to test with). tcptraceroute and socat are very useful tools.

Updates: murb informs me that there are transparent-proxy patches for Squid and Linux. It's not clear to me whether these currently would allow Squid to match on the original destination IP though.

It appears the X-Forwarded-For should consistently point to the source behind the ISP pædo-sieve. That said, this header might not be logged by default by many systems; it's also conceivable that the code responsible for updating this header may contain exploitable defects.

( Feb 12 2009, 04:20:30 PM GMT ) Permalink Comments [0]

Paul isn't here right now..

To avoid some confusion: I have been on leave of absence from Sun for a good few months now. Other than, possibly, July-September of this year, my time is committed outside of Sun until mid-2010.

Further, I have few quality cycles to spare for Quagga. I do not regularly read the Quagga lists at this time. I'm not reviewing patches, never mind integrating them. Other people are going to have to take up the slack, I'm afraid.

( Feb 12 2009, 07:14:52 AM GMT ) Permalink Comments [0]

20090131 Saturday January 31, 2009

Note to phone system operators: Relative format caller-ID is evil

Why, oh why, do phone operators configure their systems to send relative phone numbers as the caller-ID? Which muppet made that decision? :)

Ever since countries, under the ITU, agreed to a common international, direct-dial numbering standards, phone numbers share a global, hierarchical number-space. This number-space is rooted (notionally) at "+", and leads to numbers like "+44 141 bcd efgh". Such a number is "fully-qualified" and uniquely identifies some port to the phone system, globally. E.g. this example is a number in the UK, nominally in Glasgow (the 141 code). One little nit is that the "+" can translate to different codes in different systems. Thankfully many countries use "00"; further, GSM phones can just dial "+", it seems (exactly how this works I don't know, but it seems to work with multiple phones on multiple networks in various countries - presumably it's part of the standard).

It also possible to have "relative" phone numbers, which assume part of the prefix (aka Subscriber Trunk Dialling) - alternatively, you can think of this as "bottom up" or "local" dialling. E.g. if one is in the UK, there is no need to dial +44, just dial "0141 bcd efgh"; if one is in Glasgow (and certain other localities) there is no need to dial the 141 part, just dial "bcd efgh"; if one is dialing from a phone sharing the same bcd part as its primary number then that possibly can be dropped too (an exchange-local call, though the "bcd" part need not uniquely identify an exchange), just dial "efgh".

So basically a fully-qualified number can be dialled from any public phone system in the world, and it should just work, while a relative number only works in certain places. The problem is that many operators today send Caller-ID in relative form. E.g. if someone rings my UK mobile, I will see something like "078a bcd efgh" or "0141 bcd efgh" on my phone's display. This lead to various annoyances, not least:

All these problems would be avoided if phone networks just sent fully-qualified, E.164-form numbers as their Caller-ID. Relative numbers == inconvenient == fail == evil.

Even more helpful are BT: BTs landline network disallows E.164 dialling! You can't dial "+44 141 bcd efgh", (you can't even dial "0141 bcd efgh" iirc if in Glasgow). This is a big, stinking pile of FAIL if you've got a home VoIP/POTS router and you want it to dynamically select whether to dial-out via BT or via VoIP - you need to apply a bunch of horrid number-rewriting rules to stand a chance of this working, and lower-end VoIP/POTS routers might not even be capable of doing so. I can't understand why BT would disallow fully-qualified dialling to UK numbers.. BT just suck. (I think this works fine with Eircom in Ireland).

To summarise:

( Jan 31 2009, 05:15:48 PM GMT ) Permalink Comments [1]

20080519 Monday May 19, 2008

How not to improve the security of your online banking website

This blog entry was inspired by the RBS web site

( May 19 2008, 12:07:14 PM IST ) Permalink Comments [0]

20080206 Wednesday February 06, 2008

USA taken offline by a cable cut!

Some other bad news that wasn't reported today: "I can't ping some-random-router.some-institute.edu - it used to work but it doesn't now. OMGZ DA USA IZ OFFLINE!!!".

That's bad news in the "Regurgitate ill-informed rumours" sense, simply cause a simplistic web-page says some host is down, which at least a few high-profile bloggers who should know better have fallen victim to (never mind news-aggregation sites who don't care what rubbish stories drive their impression count). Better and more fact-based commentary on how the cable-cuts have affected internet connectivity is out there..

( Feb 06 2008, 04:40:40 PM GMT ) Permalink Comments [0]

20071012 Friday October 12, 2007

Email regex address validation (aka "Web Developers and Clue, the Empty Set?")

Why, oh why, must I regularly be told by online registration forms that, sorry, but the email address I entered is not valid and I should enter a valid one before allowing registration? There's so many possible problems here:

  1. The web developer just assumed they know what an email address is
  2. The web developer knows what an email address is, but got the regex wrong
  3. The web developer knows about regexes, but tried to be too clever
  4. Syntactic validation still says nothing about functional validity

The most common cause is the first. To describe them as clueless muppets who should never be allowed to code up anything that remotely stands a chance of being something anyone else must ever interact with, would probably be too strong and not generally true, but it's close enough for me for now! ;). Really, if you're writing some web form that requires you to sanity-check an email address, how hard could it be to google for "email validation regex"? If they really were keen on doing it themselves, surely reading the addr-spec BNF would be a pre-requisite? A quick glance at least, surely? Sadly, it appears many (most?) web developers are incapable of such lofty levels of rigour. Worse, it appears some of these idiots have webpages with their incorrect, broken regexes ranked high in the google results.

The next problems are that it's easy to get regexes wrong, semantically at least, and it's also possible to get too clever. A good example is Stephen Shirley's email regex effort (javascript compatible). It's slightly wrong in not allowing foo@host addresses, a small oversight. It's also locale dependent in two ways. Firstly, 'a-z' can match non-ASCII characters in some locales (ie chars Stephen didn't intend to match). Secondly, Stephen's gotten too clever: the range match very sneakily relies on ASCII ordering - cute, but locale-fragile.

Before addressing the last possible problem, here's my simpler rendition of Stephen's regex, fixed to be locale-independent, split over multiple lines for readability, JavaScript compatible:


(")?([[:alnum:]!#$%&'*+/=?^_`{|}~-])+
(\.[[:alnum:]!#$%&'*+/=?^_`{|}~-]+)*(")?
@[[:alnum:]-]+(\.[[:alnum:]-])*\.?

I've checked this against RFC2822 and the ECMAScript specification for validity. Stephen's also had a look at it. For PHP, the following should work, using ereg:


  $regex  = "(\")?[[:alnum:]!#$%&'*+/=?^_`{|}~-]+";
  $regex .= "(\.[[:alnum:]!#$%&'*+/=?^_`{|}~-]+)*(\")?";
  $regex .= "@[[:alnum:]-]+(\.[[:alnum:]-])*\.?";
  
  if (ereg ($regex, $argv[1]))
    ....
  else
    ....

However, neither of the above are guaranteed to be correct. It's easy, in trying to fully validate the syntax, to miss some subtleties of syntax or meaning (in what you're trying to validate, or in the form you're describing that syntax in). This suggests it's a bad idea to try..

Which brings us to the next problem: Even if one manages to correctly positively validate the syntax, the address need not be functional. The system must still validate the email address, typically with a probe-email from which the registrant must retrieve a URL to complete the registration. Only then can the system know the email really is valid. Further, the developer can not rely on the client-side Javascript to have been run, or not have been subverted - they must sanitise the address server side too. So the syntax checking really only is for the convenience of the (registrant), to prevent them aimlessly waiting for an email that might never come if they've typo-ed their address (and typo-ed it in such a way as to be syntactically invalid!). So really, given the difficulty of getting it right, given that it's for the user's convenience and given the system likely will functionally test the address anyway, the syntax check should not be mandatory!

In conclusion:

Update: Removed the begin and end anchor matches - whether they're appropriate depends on context of input - and added PHP example.

Update2: Add support for quoted-string, and add hyphen to domain parts, as per comments

Update3: See discussion for further corrections (which should re-inforce how bad an idea it is to try enforce syntax checks arbitrarily..).

( Oct 12 2007, 04:36:41 PM IST ) Permalink Comments [6]

20071004 Thursday October 04, 2007

Scott Adams on Policy by Reality-TV

Excellent

( Oct 04 2007, 07:02:00 PM IST ) Permalink Comments [0]

20070921 Friday September 21, 2007

PINE is dead, long live Alpine

I'm a long-time user of the best MUA out there, PINE. It has some problems though, firstly it is not free software (you may not distribute modified versions) and secondly it doesn't support UTF-8. A while ago WU started work on a free rewrite of PINE, called Alpine. Alpine is now at v0.9999. I've been using it for a few weeks now and does it indeed seem to be quite complete and perfectly useable (there's only regression from PINE I've noticed, roles only let you set one address in Reply-To).

Finally we have the very powerful but user-friendly, terminal-based PINE MUA, under a friendlier licence :). I wonder if I can get this included into SFW..

( Sep 21 2007, 08:36:01 PM IST ) Permalink Comments [1]

20070426 Thursday April 26, 2007

Cute algorithm of the day: Floyd Cycle Finder

Cute:Floyd's Cycle Finding Algorithm

( Apr 26 2007, 01:13:29 AM IST ) Permalink Comments [0]

20060412 Wednesday April 12, 2006

More HEANet / T2000 fun

colmmacc has some updates on the HEANet T2000. First off, he's tried out Solaris Express and straight away got an extra 3k request/sec, for 18k req/s. ;) Small bit of tweaking and the T2000 was pushing over 20k request/sec. Next, Colm's tried Sun9v Ubuntu Linux, getting 15k req/sec.

See Colm's Niagara category for all his postings on the subject.

( Apr 12 2006, 02:26:10 PM IST ) Permalink Comments [2]

20060328 Tuesday March 28, 2006

HEAnet Niagara benchmark update

Intriguing Niagara benchmarks update from Colm:

A few days ago I raved about the 5700 requests per second I was getting out of the Niagara box. Turns out that was a load of crap, here’s what I’m getting now;
Requests per second:    15298.68 [#/sec] (mean)

Wow. :)

( Mar 28 2006, 05:48:56 AM IST ) Permalink Comments [1]

20060321 Tuesday March 21, 2006

Niagara/T1 RTL released under the GPL

The Verilog RTL for Suns' Niagara/T1 processor has been made available under the GPL. Very cool :).

There's a source browser available too.

( Mar 21 2006, 11:55:11 PM GMT ) Permalink

20060130 Monday January 30, 2006

Quagga 1.0.0 release blocker meta-bug

Quagga meta-bug #246 has been created to track 1.0.0 release blocking bugs. All regressions are automatically blockers. If you know of regressions in 0.99 please log a bug and set the 'blocks:' field to '246'. If you don't, please test 0.99.3 and help find them.

( Jan 30 2006, 12:55:10 PM GMT ) Permalink

20060129 Sunday January 29, 2006

Software engineering and IRIX 5.1

Via a post on PlanetGNOME, I came across a really interesting email to the Risks Digest which claims to be a repost of an SGI memo entitled "Software Usability II" examining the useability of IRIX 5.1 and how it had been adversely impacted by quality, then examining why. A good read, regardless of whether or not it is an accurate account of what went on at SGI in that period (no idea).

( Jan 29 2006, 01:35:04 AM GMT ) Permalink

Niagara / sun9v HyperVisor

CNet News have a story on UltraSPARC T1's (Niagara) hypervisor capabilities. For those curious, the sun9v HyperVisor specifications are freely available from the OpenSPARC T1 site.

At some stage in the future you will be able to run multiple instances and versions of Solaris on Tx000, hopefully we'll see a Linux sun9v port too, maybe Free/Net BSD? (Not as familiar with those communities). Can't wait :).

( Jan 29 2006, 12:33:32 AM GMT ) Permalink

Calendar

RSS Feeds

Search

Links

Navigation

Referers