Alan Burlison's Work Related Ramblings

All | General | Java | NetBeans | Perl | Solaris
« Previous page | Main | Next page »

20071119 Monday November 19, 2007

How to leave Facebook - followup 1

The electrons were barely dry on my last post when I received an email from TRUSTe about the problems I'd had getting Facebook to close my account; the interesting bit is below:

Thank you for submitting your privacy complaint through the TRUSTe Watchdog Dispute Resolution program. The TRUSTe Compliance Team has reviewed the details of your complaint and we have determined that it is a valid privacy complaint. We have contacted www.facebook.com on your behalf and have outlined the steps necessary for proper resolution.

So my advice to you if you are having problems getting Facebook to close your account is to submit a complaint to TRUSTe.

Posted by alanbur ( Nov 19 2007, 07:31:35 PM GMT ) Permalink Comments [1]

How to leave Facebook

As I documented in my last post, it isn't actually possible to leave Facebook, all you can do is 'deactivate' your account. I got in touch with Facebook and asked them to delete my account, and here is the reply I got from them:

If you deactivate, your account is removed from the site. However, we save all your profile content (friends, photos, interests, etc.), so if you want to reactivate sometime, your account will look just the way it did when you deactivated. If you do want your information completely wiped from our servers, we can do this for you. However, you need to remove all profile content before we can do this. Once you have cleared your account, let us know and we'll take care of the rest.

I wrote back to Facebook, saying that their response was unacceptable. I noted that their Privacy Policy page says that they are a licensee of the TRUSTe organisation, and that as such they are supposed to give users "choice and consent over how their information is used and shared". I also pointed out that as they are now registered in the UK, they are probably also subject to UK data protection legislation. Finally, I pointed out that Facebook had also been mentioned in a Channel 4 news report about identity theft, and that the media were obviously interested in Facebook's stance on data privacy and protection. I explained that if Facebook wasn't prepared to close my account I was prepared to take up the issue with the three avenues open to me, the TRUSTe complaints process, the UK Information Commissioner's Office (ICO) and the UK press.

In return I got exactly the same response as the one above. I wrote back to Facebook yet again, repeating that that their response was unacceptable, and that I was therefore going to take the three courses of action I outlined above. I registered complaints at both TRUSTe, the ICO and I also emailed Channel 4 News, explaining my story.

Last week Channel 4 came to interview me, and the item went out on Channel 4 News on Saturday 17th November. A video of the item can be found on the Channel 4 website. There's also details of the response from Facebook to C4's questions about their policy and process for account closures. Once the item had aired, I wrote again to Facebook, explaining that their response was still unacceptable, and that I'd taken the three options I'd identified in my earlier mail. Here's an excerpt from my mail to Facebook:

The Channel 4 web page I refer to above says:

----------
Vanessa Barnett, an internet lawyer with Berwin Leighton Paisner, told Channel 4 News: "The Data Protection Act is designed to protect individuals like me from having our data used in ways that we don't want. We get to choose how data gets processed, what people can do with it, and if we don't like it, we can say, 'Please stop'"

"Ultimately it's a question for the information commissioner as to whether someone is in breach of the act. And he has to balance two different things. Yes certainly, I as an individual have the right to say, 'please don't have my data,' but he also has to balance the rights of the business not to have to expend lots of money trying to get rid of that data."

So could Facebook argue that it's just impossible for them to provide an easier way to delete data? Or that they don't have the money to implement one? They didn't make that claim to us. In fact, they didn't engage with the question of why they need to retain data at all - they just didn't answer it.

Vanessa Barnett again: "One of the very key things that the information commissioner will look at is the resources of the business. And if that business has lots of money and lots of IT infrastructure, has the capabilities for example to easily write scripts to delete it, that will certainly sway the information commissioner into whether that data should have been deleted."
----------

I also notice that Facebook make the following statement on their Privacy Policy page:

----------
In the event that we learn that we have collected personal information from a child under age 13 without verification of parental consent, we will delete that information as quickly as possible. If you believe that we might have any information from or about a child under 13, please contact us at XXXXXXXX.
----------

So it seems quite clear that Facebook *does* have the ability to delete accounts from the system, but for some reason chooses not to, other than for children of under 13. I will be pointing this out to the UK Information Commissioner.

Once again, I reiterate my case - Facebook has a duty to make it possible for users to delete their accounts in a reasonable and convenient manner, and from the statement on the Facebook Privacy page, Facebook clearly already has the mechanisms in place to make this possible.

I await your response with interest.

As well as sending my mail to the Facebook support person I had been dealing with, I also sent it to Chris Kelly, Facebook's Chief Privacy Officer, and Mark Zuckerberg, the Facebook CEO. Neither mail bounced, so I must have guessed their email addresses correctly. Earlier on today I received the following response from Facebook:

We have permanently deleted your account per your request. We do not retain any information about your account once it is deleted, and thus deletion is irreversible. Please let me know if you have any other questions or concerns.

Hurrah! Although to be honest, this raises almost as many questions as it answers. If Facebook has the ability to delete accounts so easily, why don't they make it available to users? In their written response to C4 they say that "Facebook does not use any information from deactivated accounts for advertising purposes." If that is the case, why do they retain the information at all? And although they aren't using it for "advertising purposes", are they making other use of it, and if so, what?

I'm still waiting for responses from either TRUSTe or the ICO, I'll be sure to blog about them when I receive them. In the meantime, if you want to get Facebook to delete your account entirely, you can always try mailing them, quoting the clear precedent they have set by closing my account. I really can't understand why Facebook make the whole process so difficult, they are an extremely popular service and the amount of work involved in closing accounts properly is tiny in comparison to the volume of activity the site sees.

Posted by alanbur ( Nov 19 2007, 06:35:38 PM GMT ) Permalink Comments [9]

20071102 Friday November 02, 2007

Facebook and your lack of privacy

I've just attempted to delete my Facebook account, only to find this on the 'deactivate' page:

Opt out of receiving emails from Facebook. Note: Even after you deactivate, your friends can still invite you to events, tag you in photos, or ask you to join groups. If you opt out, you will NOT receive these email invitations and notifications from your friends.

You can reactivate your account at any time by logging in with your email and password.

So quite clearly they DON'T actually delete your data, and I have been unable to find an option on the website to do this. I've emailed their privacy department, it will be interesting to see what response I get...

Posted by alanbur ( Nov 02 2007, 09:53:49 AM GMT ) Permalink Comments [0]

20071015 Monday October 15, 2007

Producing CSS with self-caching JSPs

One thing that's always bugged me about CSS is that it doesn't have any sort of macro facility, so if you want to use (say) a HTML colour code in several places you need to hard-code it into all of them, which is a right pain. As the app I'm working on is written using JSPs, the obvious thing to do was to output the stylesheet from a JSP, and then use EL to define variables and then insert them in the appropriate places, e.g.:

<c:set var="background" value="#292B4F"/>

body {
    background-color:   ${background};
}

That works just fine, but each time the stylesheet is referenced it is regenerated, and as the content is completely static that was obviously a bit inefficient. Some googling found a number of ways to cache the output of JSPs, from the simple to the extremely complex, e.g. using a servlet filter to intercept requests and cache output. I wanted the simplest possible mechanism that would do the job, with no external dependencies.

A fairly obvious technique is to cache the output in the servlet's application scope the first time it is generated, then on subsequent requests you just send back the cached output rather than regenerating the content. The basic outline looks like this:

<c:if test="${empty cachedStylesheet}">
<c:set var="cachedStylesheet" scope="application">

    <!%-- Cached content goes here -->

</c:set>
${cachedStylesheet}

That works fine, but if you watch the conversation between the browser and the servlet, the cached content is still refetched each time this is referenced. This is because JSP output is usually dynamic, so the servlet container sets things up so that the client browser doesn't cache the content obtained from JSPs. To fix this we need to manually add the appropriate headers to the HTTP response we send back to the client to that it knows to cache the content, and we also need to respond to requests from the browser asking if the content has changed. The relevant HTTP headers are If-Modified-Since, Last-Modified and Expires, see the HTTP specification for more details. This requires a little bit of additional inline Java code in the JSP, so the final version looks like this:

<%@page contentType="text/css; charset=UTF-8" pageEncoding="UTF-8" session="false"%>
<%@taglib prefix="c" uri="http://java.sun.com/jsp/jstl/core"%>
<c:if test="${empty cachedStylesheet}">
<c:set var="cachedStylesheet" scope="application">

<%-- Cached content goes here -->

</c:set>
<c:set var="cachedStylesheetDate"
  value="<%= new Long(new java.util.Date().getTime()) %>"
  scope="application"/>
</c:if>
<%
  long date = (Long) application.getAttribute("cachedStylesheetDate");
  long mod = request.getDateHeader("If-Modified-Since");
  if (mod > date) {
      response.setStatus(HttpServletResponse.SC_NOT_MODIFIED);
      return;
  }
  response.setDateHeader("Last-Modified", date);
  response.setDateHeader("Expires", new java.util.Date().getTime() + 86400000);
%>
${cachedStylesheet}

Here's how it works: We cache the content the first time the JSP is accessed as before, but we also set a session scope variable cachedStylesheetDate that records the time when the output was generated. Then on each request we fetch the value of any If-Modified-Since header specified by the browser. If the content was generated before the date, the browser already has an up-to-date version of the content, so we just send back a 304 response (Not Modified) to indicate that fact. Otherwise we set the Last-Modified header to the date when the content was generated, and then set the Expires header to tell the browser to check again in (60 * 60 * 24) * 1000 = 86400000 milliseconds, i.e. in 24 hours, and then we send the cached output. That way, after the initial fetch, the browser will only check once a day to see if the content has been updated - that figure can of course be adjusted as necessary.

Posted by alanbur ( Oct 15 2007, 03:29:59 PM BST ) Permalink Comments [0]

20070929 Saturday September 29, 2007

The new opensolaris.org user management webapp moves into the light

As outlined here I've been working away on providing the infrastructure needed to break up the current monolithic J2EE application used to host opensolaris.org. I've just released the source code of the existing portal, a large J2EE application. As detailed in the restructuring document, the intention is to split the functionality of the current webapp into smaller, more manageable parts. As I was releasing the portal code, I thought it was a good time to move my development repository outside as well. Accordingly, you can browse the source at:

http://src.opensolaris.org/source/xref/website/auth/trunk/

and the source can be accessed via anonymous SVN at:

svn+ssh://anon-AT-opensolaris.org/svn/website/auth

It is *very* rudimentary and is *far* from being finished. What I've put on opensolaris.org is just a snapshot of my development workspace as of today. However the database schema is available, along with a small set of dummy data which is used to populate the Derby database currently used by the prototype webapp.If you have any questions, I encourage you to subscribe to the website-discuss mailing list and ask them there.

Posted by alanbur ( Sep 29 2007, 12:48:25 AM BST ) Permalink Comments [3]

Source code for the opensolaris.org portal released

After a long haul, I've finally released the source code of the J2EE web application that runs the opensolaris.org website. If you are interested, the source can be browsed online at:

http://src.opensolaris.org/source/xref/website/portal/trunk/

and the source can be accessed via anonymous SVN at:

svn+ssh://anon-AT-opensolaris.org/svn/website/portal

If you have any questions, I encourage you to subscribe to the website-discuss mailing list and ask them there.

Enjoy!

Posted by alanbur ( Sep 29 2007, 12:41:22 AM BST ) Permalink Comments [0]

20070823 Thursday August 23, 2007

Acrobat on Solaris x86 - a hopeful sign!

The folks over at Adobe have just set up a blog entitled Adobe Reader on Unix, and the first post says:

It's time to get the ball rolling for the much awaited blog for Adobe Reader on Unix platforms. The purpose of this blog is to provide a platform for developers and the users of the product to share ideas, experiences and feedback about the product for the benefit of everyone.

Which is a very hopeful sign - the version of Acrobat which is currently available on Solars x86 is ancient, and I'm certain an up-to-date version would be warmly welcomed all round. Why not head over to the Adbobe blog and give them some encouragement? ;-)

Posted by alanbur ( Aug 23 2007, 02:32:26 PM BST ) Permalink Comments [1]

20070621 Thursday June 21, 2007

Greg P on the BBC

Just noticed an interview with Greg Papadopoulos, Sun's CTO in the Technology section of the BBC News website.  The interview is about Greg's views of future technology trends.  He makes some interesting points about mobile phones, PCs, the inexorable rise of the network, and kitchen utensils :-)

Posted by alanbur ( Jun 21 2007, 12:29:38 PM BST ) Permalink Comments [2]

20070614 Thursday June 14, 2007

Visiting Baby - the world's first stored-program computer

I had to attend a meeting in Manchester on Tuesday afternoon, so while I was there I took the chance to drop in on Baby, or as it is more properly known, the Small-Scale Experimental Machine, or SSEM for short. Baby was the world's first stored-program electronic digital computer, running its first program on June 21 1948. Now before all you US citizens start getting all uppity and start telling me that ENIAC first ran two years earlier in 1946, the unique and ground-breaking feature of Baby is that the program which it ran was stored electronically and in the same store as the data, making it the first machine with a von Neumann architecture - Baby is therefore the ancestor of every modern computer. ENIAC used decimal arithmetic and was hard-wired - changing the program required a lady with a pair of pliers. And anyway, ENIAC was beaten by the British Colossus (1943), which was both binary and electronic, but not Turing-complete, and in turn Colossus was beaten by the German Z3 machine (1941) which was Turing complete and used binary arithmetic, but was electromechanical.

Right, now we've got the historical arguments sorted, on to the machine itself. The machine was constructed at the University of Manchester by Frederic Williams, Tom Kilburn and Geoff Tootill. During WWII, Williams and Kilburn had both worked at the Telecommunications Research Establishment, which was a cover organisation set up to do work on radar.

The ground-breaking feature of Baby was the Williams-Kilburn tube. This was a cathode ray tube which formed the memory of the machine. Williams and Kilburn had extensive experience of CRTs from their experience of developing WWII radar equipment, and in fact the tubes they used in Baby were from radar equipment. The glass surface of the tube was used as the store - the electron beam was used to write a grid of charged spots on the glass, with different charge levels representing 1 and 0. The values were read off via a metal plate on the outside of the tube. The charges leaked away over time, so the contents had to be refreshed on a regular basis, in a very similar way to modern DRAM. Baby used a 32 x 32 pattern of dots, giving a memory capacity of 32 words each of 32 bits. Williams-Kilburn tubes provided random access to the stored data, unlike the alternative technology of the time, the mercury delay line, which only provided serial access. Baby actually used four tubes, one as the main store (memory), one as the Accumulator, another to hold the address of the current Instruction (CI - Control Instruction) and the instruction itself (PI - Present Instruction). The final tube was used to mirror the contents of the other three, the tube displayed being switchable. The display tube was necessary because the other three tubes had a pick-up plate on the front, and they were also heavily shielded to prevent electrical interference.

The instruction set used bits 0-12 of a 32-bit word to hold the target address, and bits 13-15 to define the operation. The instruction set was very simple with just 7 instructions:

Note there is no addition operation - you can implement addition using subtraction and negation, but you can't do the inverse. The reason for the limited memory and instruction set is because Baby was intended purely as a technology test-bed. The work carried out fed directly into later machines such as the Manchester Mark 1 and the Ferranti Mark 1. In 1995 the decision was taken to build a complete replica of Baby in time for the 50th anniversary celebrations in 1998 and this effort was successful, with the resulting replica being housed in the Museum of Science and Industry in Manchester. The machine is powered up every Tuesday, hence my visit.

The picture on the left shows the entire machine. The rack on the far left contains the power supplies, the rack on the far right contains the storage CRTs. In the centre is the panel containing the control, switches and display. The remaining racks contain the machine's logic circuits. The photo on the right shows a close-up view of the control rack. At the top is the display CRT - if you look carefully you can see that a program is running, an animation of a ship is visible at the bottom of the tube, sailing from left to right. The knobs to the right of the tube allow you to select the tube which is being displayed. Immediately below is the 'typewriter' which is used to input the program, each switch corresponds to one bit. There are 40 switches, so 8 of them are unused. Below that is a panel containing switches to select the line number in the store to modify (top) and switches to select the Function (opcode) to be executed (bottom). Finally, at the bottom of the picture is a row of control switches used to clear, load, run and stop the program. Click on either picture for a bigger view.

Great pains have been taken with historical accuracy - the people who worked on the original machine were consulted wherever possible, and many hours were spent poring over original notebooks and old photos, identifying the exact placement of component and labelling. The same components as were used in the original machine were used throughout. Even the racks are authentic, even though some of them had to be rescued from someone's garden! The racks are standard Post Office ones with 19 inch mountings - anyone recognise that dimension? ;-) They even went as far as replicating the numbering on the bottom of the racks, although nobody knew what the numbers were, and the chances were that the racks were second-hand in the original machine. My favourite touch is that in the original machine the display CRT (top of the right-hand picture) was propped up on a cardboard valve box as the cutout in the panel was too big - that's been replicated too!

The first program to run on the machine was written by Tom Kilburn, and calculated the highest factor of 218. One of the other early programs written for the machine was a long division routine written by Alan Turing who was working at the National Physical Laboratory at the time, but who was shortly to move to Manchester University.

I asked how reliable the replica was, and remarkably it seems to be very reliable, unlike the original version. It seems that the main reason is that the valves in the replica are mostly 1950s and 1960s vintage, and manufacturing techniques improved rapidly in the period after the war. The machine is run every week, and in the last 10 years only 3-4 valves have failed. More problematic are the old wire-wound resistors, which tend to fail more often.

To coincide with the 50th anniversary there was a programming competition. As a result several simulators are available, along with a programmer's reference manual and example programs. My favourite emulator is David Sharp's, which has a graphical interface that is similar to the real machine. It is written in Java, and I've been working with David to produce a Webstart version (not available yet) as well as trying to improve the visual accuracy of the simulator.

It was fascinating to read some of the documents and papers that have been written about the machine, and to see the machine actually running. Many of the concepts and techniques we still use today such as dynamically refreshed memory and relative jumps first appeared in this machine and it is amazing when you realise just how architecturally similar the SSEM is to modern machines, despite its appearance. I knew that Manchester had played a part in the early history of computing, but until I spent some time reading about the SSEM I hadn't realised just how pivotal that contribution had been.


Posted by alanbur ( Jun 14 2007, 02:01:04 PM BST ) Permalink Comments [1]

20070529 Tuesday May 29, 2007

Someone had to do it

/dev/bollocks goes Web 2.0

Posted by alanbur ( May 29 2007, 07:04:49 PM BST ) Permalink Comments [0]

The birth of a new programming language

A rare chance to watch (and even contribute to!) the birth of a new programming language - lolcode.com. 1337!

Posted by alanbur ( May 29 2007, 06:15:31 PM BST ) Permalink Comments [0]

20070503 Thursday May 03, 2007

XML-based J2EE frameworks considered harmful

Are you using one of the XML-based web frameworks such as Spring, WebWork or Struts?  Then you've been duped.  Conned. Flimflammed. Bamboozled. Hornswoggled.  (Yes, this is a rant ;-) Here's why:

I'm fully aware that the list above is crammed with generalisations, and that there are various hacks and workarounds for some of the issues.  However the overall point I'm making is that XML isn't a programming language, yet virtually all of the major J2EE frameworks use it as if it is.  The pervasive use of XML for tasks for which it is not suited has effectively discarded the last 30 years of software engineering advances. This is a huge mistake, and I expect that in 5 years time people will look back at the current XML mania and say "What the hell were we thinking of?"

Posted by alanbur ( May 03 2007, 12:44:05 PM BST ) Permalink Comments [5]

20070405 Thursday April 05, 2007

Why I hate XML configuration files

I'm working on a reasonably large (1000+ source files) J2EE application that makes extensive (some might say utterly excessive) use of external components - it requires in the order of 50 JAR libraries to run, over and above the standard J2EE ones.

And of course many of those JAR files have their own unique XML files to configure them.

And of course many bits information related to the configuration of the application (URLs, database details etc) have to be repeated in more than one of those XML files.

And there's no global way of doing this.

So the people who originally wrote the application came up with a scheme - they'd store the configuration values in property files.

But that only helped for things that got the values dynamically, for example by using the J2SE Properties class.  All those external components they'd used didn't know anything about the application's property files, they only knew about their own unique little XML files.

So the people writing the application came up with another scheme - they'd embed tokens in all those little XML files, then use the Ant Filter task to replace them at build-time with the values from the property files.

"Job's a good 'un!" they doubtless exclaimed, flushed with their extreme cleverness.


Then I came along and had to maintain the beast, when I found:

All this means that there's a whole series of bear traps set for anyone who innocently changes any of the properties files of the application, in the mistaken assumption that the application will actually take any notice of them.

I'm sure I'm not the only person to have been hit by this problem - how to configure multiple external components which all have their own XML configuration files - without having to hand-edit each of the XML files each time every time something changes, and without hard-coding the configuration at build time.  However I'm damned if I know of a good way of doing it - although I can easily think up several not-very-good ways of doing it.

If you know, please let me know!

Posted by alanbur ( Apr 05 2007, 11:38:44 PM BST ) Permalink Comments [1]

20070116 Tuesday January 16, 2007

It's official - Sun is the #1 contributor to Open Source - by a long way

Just seen this link posted internally: the rather windily-named Study on the Economic impact of open source software on innovation and the competitiveness of the Information and Communication Technologies (ICT) sector in the EU. To cut to the chase, Sun is acknowledged as being the number one contributor to Open Source, outstripping the second contributor (IBM) by nearly 3½ times.  I've reproduced the relevant table below:

Table 5: Cost estimate for FLOSS code contributed by firms

Total contribution from firms
Number of firms986
Source lines of code31.2 million
Estimated effort16444 person years
Estimated cost1.2 billion Euro
Top contributors
RankNamePerson-monthsCost (mil euro)
1sun microsystems inc.51372312
2ibm corp.1486590
3red hat corp.974859
4silicon graphics corp.773647
5sap ag749346
6mysql ab574735
7netscape communications corp.524932
8ximian inc.498530
9realnetworks inc.441227
10at&t428626

And that's before the recent OpenJDK announcement!

Posted by alanbur ( Jan 16 2007, 08:31:23 PM GMT ) Permalink Comments [2]

20061201 Friday December 01, 2006

A useful little web access log reporting tool

One of the things that was on my to-do list after setting up the Meninos do Morumbi Oldham website was to do something on the reporting front with the server log files.  I'd already set up Tomcat to generate combined log format files by putting this in the server.xml file:

    <Valve className="org.apache.catalina.valves.AccessLogValve"
directory="logs"
prefix="access."
suffix=".log"
pattern="combined"
resolveHosts="false" />

so I had the raw data I needed, I just needed to do something with it.  In the past I've used AWStats to do log file reporting, but it is written in perl and therefore needs a CGI-bin setup.  This is easily done if you are running Apache, but I'm running stand-alone Tomcat, and although you can run CGI stuff under Tomcat, it isn't really recommended.

As is often the way, I was looking for something else entirely when I came across Visitors, a stand-alone log file analyser written in C.  It writes its report as either a single HTML or text file, and the report looked fine, so I could just run it from cron once an hour and put the generated report somewhere in the tree managed by MeshCMS.

One small additional wrinkle: as you can see from the sever.xml entry above I've turned off DNS lookups for the access logs.  The reason for this is that DNS lookups can take some time, and I don't want logging to slow down the web server.  However it's useful to have resolved names for reporting purposes, so I pipe the log files through the apache logresolve utility before feeding them into Visitors.  At the moment I'm doing this each time I build the reports - I should really just do this once and cache the result, but that's a job for another day :-)


Posted by alanbur ( Dec 01 2006, 12:06:19 PM GMT ) Permalink Comments [0]