The other day I wrote about Chuvo, the Portuguese water dog we use to sniff out stale web content. We have another secret weapon called the PAges NObody Looks At (PANOLA) process that we deploy periodically. Chuvo and PANOLA work together to keep our web site clean. The process works roughly as follows, and you can follow it on your web site too:
  1. Create a dump of all URLs on your site. Call this AllURLs.txt.
  2. From your web metrics system, dump a separate list of all pages that got traffic (your metrics program won't know about pages that didn't get any traffic -- by design, metrics systems can only measure what does happen, not what doesn't happen.). Call this ViewedURLs.txt.
  3. Choose a cutoff number of views (we use Chuvo's 8-10 views-per-quarter for inspiration, but then move the bar a bit higher since we figure any page on Sun.com ought to be well more popular than a Portuguese water dog), so we actually cut off all pages that get less than about 100 views in 90 days. Call this TrimmedViewedURLs.txt.
  4. Normalize the format of the URLs in TrimmedViewedURLs.txt, so that the URLs listed match the format exactly of the URLs in the AllURLs.txt dump
  5. Run a diff  comparing TrimmedViewedURLs.txt against AllURLs.txt
  6. The result is your list of purge candidates.
You're done. Almost. There are the social aspects:
  1. Send automated email to all the affected page owners and their bosses alerting them that their pages are about to be deleted. (You do have an up-to-date online list of page owners, right?) Set a deadline for them to reply.
  2. In parallel, visually inspect the pages on the purge list to understand the navigational implications of deleting the pages (some pages may be linked to by other pages, and you'll want to fix that... unless you have an automated publishing system that takes care of that sort of thing automagically.)
  3. Review the emails from the page owners, assuming you actually got any replies.  Send another reminder and mention the deadline really clearly. Actually, send about four more reminders.
  4. (To keep up morale, send around short humorous notes within your team with the funniest or strangest content you find as you rummage through the purge list.)
  5. After consulting with anyone who has replied, revise your final purge list. Pick a day.
  6. Make sure you have a backup image of the site you can can recover anything quickly if you accidentally deleted it.
  7. Purge the pages.
  8. Brace yourself for one or two frantic emails from page owners who ask "why wasn't I informed of this?" even though you mailed them about five times.
  9. Monitor site comments and 404s to make sure you didn't delete anything really important.
  10. Open the frosty beverage of your choice and celebrate a cleaner web site!
Tunes: 43: IQU: Crazy




Comments:

Post a Comment:
  • HTML Syntax: NOT allowed

This blog copyright 2009 by MartinHardee