Saturday, August 27, 2005

Spamflagging Saturday

Last Saturday, in a post discussing BlogSpot’s Flag button, I linked to six spam blogs I’d identified and flagged. Today they’re all gone; in fact, as a commenter pointed out, five of them were taken down within 24 hours.

Well, that’s power; maybe the Flag button does work for spam. Or maybe not: I did also send a backchannel email to Blogger Buzz, asking about spam policies, and linking to my recent posts.

So let’s have another go, this time on flagging alone. A quick spin through twenty clicks of the Next Blog button turned up five obvious spam blogs (as before, all links are nofollow):

Again, I’ve flagged ’em, I encourage you to follow suit, and we’ll see if they last the week.

[Update Saturday, September 03: all still there.]

Doc Searls suggested earlier this week that spam blogging is driven by AdSense advertising. I’m not convinced that this is entirely the case. More often than not, spam blogs on BlogSpot don’t carry AdSense ads themselves. Only one in the small sample above does, attempting to exploit the high prices paid for treatment and pharmacetical keyword clickthroughs. The remaining four link carry no adverts, but link back to parent sites in an attempt to game Google into ranking the parent higher.

Bonus link: the Fighting Splog blog takes a much more vigorous stand against spam blogs—with some success. Although, ugh, what a horrible neologism splog is; do bloggers have to coin a new word for every single concept they come across?

Categories: Spam

Thursday, August 25, 2005

Recipelog

Well, it’s been quietly running on and off for a month now, and it’s proved useful to me already, so: Recipelog. An experiment in blog as outboard brain. No more scrappy photocopies, newspaper clippings, or index cards for me: from now on, I’m posting recipes that I try out and want to keep to the recipelog. The first post goes into more detail on the whats and whys, and explores some of the copyright implications of reproducing recipes. It’s much more for my benefit than yours; but hey. Self-interest is what blogging’s all about.

(Recipelog home; feed; subscribe in Bloglines.)

And while I’m in touting-my-other-blogs mode, a reminder that I clip and comment, often sarcastically, on interesting links at my linkblog. Although it appears in miniature on the sidebar to the right—via a clunky mix of technology that I really should describe at some point—it’s also a standalone blog in its own right: feed, permalinks, archives, comments and all.

(Linkblog home; feed; subscribe in Bloglines.)

Wednesday, August 24, 2005

I am tired of the Google Blog

Sorry, Google; but the Google Blog is not providing “Googler insights into product and technology news and our culture”. It’s simply a series of press releases, massaged slightly to give them a more folksy tone; it just doesn’t ring true.

Today’s post announcing Google Talk carries the byline of a Google software engineer. But it has a strong whiff of marketing ghostwriting:

Google has a friendly talk-in-the-hallway kind of culture that I love, but Google engineers seem to be everywhere now, from Bangalore to Tokyo to Dublin to Zurich. I work on a team that’s in Mountain View, Kirkland, Pittsburgh, and Philadelphia. We like to talk about the projects we’re working on, but a hallway is hard to come by. So we’ve put together a gadget that keeps us talking, even when we’re on different sides of the planet.
I'm sorry, but no: I don’t buy it. Google Talk is clearly not some 20% project you knocked together to make life easier for yourselves; it’s strategic. There are plenty of existing instant-messaging services that a software team could have used to stay in touch.

The sound is great—usually much better than a regular phone—and it’s a perfect way to use that computer microphone you never realized you had. My laptop with its built-in mic makes a superb speakerphone. Google Talk also works great with just about any standard mic or headset you can plug in to a computer.
Oh please: no software engineer talks like this.

The Google Blog is a real wasted opportunity, because the world is full of people who would like to see deeper into Google’s operations, products, and culture. Microsoft have opened up considerably, both with hundreds of blogs (official and unofficial; Raymond Chen’s The Old New Thing is one of my favourites) by Microsoft employees and with their Channel 9 project. Google’s following is even more cultish than Microsoft’s. More open, and more human, communication might serve them well.

(As for Google Talk: meh. Introverts don’t do well on real-time chat; too much social pressure. I never liked IRC much; I never took to IM; hell, I still don’t much like the telephone. The move towards interoperability is interesting, though.)

Tuesday, August 23, 2005

Iron Horse / Shell Ridge Connect-the-Dots Hike

I’m carless today—Melinda’s visiting her parents—so for tonight’s evening hike, I set out on foot, connecting together a number of previously-hiked trails.

From the apartment I head north on the Iron Horse Trail to Walden Park, the starting-point of an earlier hike on the Contra Costa Canal Trail. That day we headed west; today, though, I head east on the Canal Trail towards Heather Farm. The Iron Horse Trail here is wide, dusty, hot, and busy; but the Canal Trail is shady and cool, and feels a lot more private.

Trail map of Iron Horse Trail, Contra Costa Canal Trail, and Briones–Mt. Diablo Trail

At Heather Farm I retrace my previous steps on the Briones–Mt. Diablo Trail, following it south where it forks off from the Ygnacio Canal Trail. At the end of a tarmac road, the trail suddenly gets narrow and steep, climbing up to run along a ridge; a big change from the flat, wide, metalled trails so far. And the views west start to open up. It’s hazy today, and the rows of hills fade into the distance.

The Briones–Mt. Diablo Trail connects to Shell Ridge Open Space just above the Marshall Drive trailhead, where we hiked from last week. I can’t resist climbing up onto Shell Ridge, although today I take the EBMUD access road. It’s a big improvement over slogging up on the Ridge Top Trail; steep, but short, and it climbs on the west side of the ridge so you get the breeze and the best views. I follow the Ridge Top Trail to the peak before it drops down the east side of the ridge. No coyote today, but I do see a group of young deer above me on the ridge; they very carefully watch me go past.

Trail map western Shell Ridge Open Space.

I take the Ginder Gap Trail south before returning west on the Briones–Mt. Diablo Trail. It’s 7:40pm, the sun is starting to sink, and I’m starting to get a little nervous about having mistimed things; I don’t want to be still in the park in total darkness. I press on a little. Today’s sunset is not nearly as dramatic as the one we lucked upon last week; a warm glow behind the hills, but no spectacular cloudscape today. Lots of dogs getting their evening walks on this section of the trail.

At Indian Creek I take the Fossil Hill Trail south before switching to the Kovar Trail west, which heads out of the park through Howe Homestead Park. It’s definitely getting dusky now, and for a while it’s mildly disquieting: a lot more rustlings in the undergrowth than during the daytime. But the sensation passes and I start enjoying the solitude, the cooler air and the sounds of the crickets.

And suddenly, the Kovar Trail crests the ridge and I’m back in civilisation again: the lights downtown Walnut Creek spread out below me, traffic noise drifting up the ridge, and the tick-tick-tick of sprinklers as the houses backing onto the trail do their evening watering.

Down to Howe Homestead and back home, arriving home in the dark at 8:30pm, three hours and some 7½ miles after starting. For a scratch hike, this worked out very well; the flat trails at the start providing a good warmup for, and contrast to, the hills later on.

Categories: Hiking

Maybe it’s working…

…the exercise regime, that is: an average of 80 lengths in the pool every day, or a hike. Yesterday, buying shorts in Target, and for the first time in a long time: down two inches at the waist. It could be the usual vagaries of clothing sizes; but I prefer the optimistic view.

Maybe this cancels it out, though:

Limited Edition KitKat Coffee wrapper.

Oh yes: KitKats aren’t as good here (they’re Hershey chocolate, not the Nestle chocolate of the UK versions) but they still come in weird limited editions. This one I’m not quite sure about. “Artificially flavored”, says the label, but it’d be pretty easy to spot as an artificial flavour without the hint. It smells very strongly, and as soon as you tear open the wrapper, of fake coffee. But as a mixture with the chocolate, it kind of works; I think I quite like this one.

I paid 40 extra lengths in the pool for that.

Wikipedia has a long, but not (I believe) definitive list of KitKat varieties; Japan gets some really odd ones.

Categories: Food

Round Valley: We walk the Miwok

A Sunday-evening hike at Round Valley Regional Preserve. bahiker.com says this hike is “moderately easy”; more moderate than easy, I’d say.

Round Valley’s a smallish preserve to the south-east of Mount Diablo, pretty much the opposite side of the mountain from Walnut Creek. The drive in on Marsh Creek Road is wild: rolling hills and tight bends, and what’s that dead pig doing by the side of the road?

The park itself is hot and dry: too hot to hike during the day. We follow the bahiker.com route, which loops anticlockwise on the Miwok Trail—a wide, dusty, stony fire trail, alongside Round Valley Creek, which at this time of year is completely dry—before returning on the Hardy Canyon Trail. This is where it starts getting tough. bahiker.com says: “a steady ascent, not hard but with no respite”. “No respite” is right; it’s a relentless 600 foot climb. “Not hard” is not quite so right. It’s tough enough that Melinda threatens to turn around. There are good views behind us of Mount Diablo, gradually becoming more visible above the hills as we gain height, but we’re too winded to appreciate them.

After cresting, the trail heads downhill alongside High Creek—also completely dry— and back to the parking lot. There’s an interesting view from the top, where the flat expanse of the Central Valley is briefly visible behind the intervening foothills; not all of Northern California is hilly.

Just over 4 miles, but the climb in the middle dominates it. Good exercise, but not so much fun. Round Valley would probably be more interesting, and less dry and dusty, in spring or fall; and I’d consider doing it the other way around, too. Also: it’s not so good doing it close to sunset, as you end up driving home directly into the setting sun.

Categories: Hiking

Monday, August 22, 2005

Sibley Cloudwalk

I’m running a little behind; this is last Friday’s hike, at Robert Sibley Volcanic Regional Preserve, one of the string of regional parks along the Berkeley hills.

Sibley’s a small park, and one of the oldest of the East Bay Parks; 660 acres surrounding Round Top, a long-extinct volcano. On a clear day, it should have spectacular views across the East Bay. Only one problem: Friday was not a clear day. Oh, it was clear and hot in Walnut Creek; but the clouds were low over the hills and as we climbed up Grizzly Peak Boulevard on the way to the park it got dimmer and colder and foggier. By the time we got to Sibley we were completely within cloud, with visibility down to a few hundred feet.

But actually, walking in cloud is not a bad experience; it’s very otherworldly, the views replaced by walls of white, wisps of fog blowing across the trail, and water dripping from the trees.

The park map recommends a route with a series of numbered stops at sites of geologic interest. But frankly, if you’re not into rocks, they’re not really that interesting: one breccia’s much like another. The route leads out to the edge of the public land on the Volcanic Trail, the remaining part of the park being a closed land bank, and then loops back on the Round Top Loop Trail. Most of it is open and scrubby hillside, on wide fire trails. However, the eastern side of the loop trail narrows, climbing to a ridge; here the fog thins for a few moments providing a sudden, and fleeting, glimpse of Mount Diablo. The trail dips down into mixed forest of eucalyptus, bay, and pine. (Which is handy: I’m all out of bayleaves, so I pick a pocketful of bay to take home and dry out. California bay is stronger than domestic bayleaf; a little goes a long way.)

Trail map of Sibley.

I don’t feel like I’ve walked far enough, so we walk the access road up to the top of Round Top and back. The journey is better than the destination; weak sunlight shines through drifting fog in the old oak trees. The peak is wooded, and dominated by transmission towers (American Tower T1, T2, T3; Western States Teleport) with no real views in any direction. Back down the road to the parking lot.

About 3½ miles; too short for me, really, but probably worth returning to on a clear day for the views. Check the weather first, though: maybe on this webcam.

And on to Berkeley. Naan’n’Curry on Telegraph is excellent. I’ve missed a good curry; and this is great and cheap. Really good naan; tasty curry, although a little bony; and Melinda’s vindaloo is suitably vinegary rather than simply hot.

Amoeba is still dumping k.d. lang into the clearance bins; I picked up a copy of 1997’s Drag for $1. Amazon has used copies cheaper, but not once you’ve paid the $2.49 shipping charge. But in general I'm less impressed by Berkeley's used record shops (Rasputin is the other big one) than I used to be; on most items, they're simply not competitive with Amazon Marketplace.

Categories: Food, Hiking

Saturday, August 20, 2005

BlogSpot's Flag button: assuming the worst

An anonymous comment to my previous post on BlogSpot’s Flag button raised one point worth expanding on further:

It’s amusing to read all these conspiracy theories about what Google will do with this feature. Do you just assume the worst possibility and that the company that brings you a free blogging and hosting tool is to harm its users?
Ah, it’s speculation when the outcomes are good, but it’s conspiracy when they’re bad; and it’s unthinkable to question the hand that feeds you.

Well, piffle. It’s never wrong to question or to explore possible outcomes.

Are Google setting out to intentionally harm BlogSpot users? Obviously not. But there’s an important point here, which is this: simply believing your actions are harmless—Google’s often-quoted “don’t be evil” credo—isn’t always enough to prevent harm from happening. The most well-meaning action can be harmful, whether by ignorance or by unexpected consequences.

Arguably, Google’s custodianship of Blogger, since its acquisition, hasn’t been that great for BlogSpot users. Not because Google has actively harmed us, but because by failing to act decisively on the growing spam problem it has gradually eroded our reputation: as I commented earlier, to the point at which there are calls for BlogSpot to be excluded from search engines. “Probably spam” isn’t a nice pigeonhole for those of us running legitimate blogs here to be put into.

I saw the Flag button as a good thing because I saw it as a sign that Google’s finally woken up to the BlogSpot spam problem. But I’m still on the fence as to whether it’ll prove to be a good or bad thing in the long run: as I said in my initial post, it all depends on how Blogger staff respond to flagged blogs.

A recent post on the Blogger Buzz blog attempts to defuse some of the concerns:

We’re not automatically removing content based on the flags. We’re using the feedback from Blog*Spot readers to help assess what the community has noted as potentially objectionable.
So far, so good: the Flag button is a way to bring problems to a human moderator’s view. But oddly, both the original announcement and the clarification back away from any mention of spam:

To clarify, our primary concern is to avoid promoting objectionable content in places like NextBlog or the Dashboard.
How odd. So, it’s more important to keep the occasional “fuck” off the Dashboard than it is to address the search engine noise caused by BlogSpot’s deluge of spam blogs? I took a quick random spin through 10 blogs; 6 of them were spam (all links are nofollow):
I’ve flagged ’em all; feel free to follow suit. I’ll check back in a week or two to see if anything’s happened.

[Update Saturday, August 27: all gone.]

Is the Flag button intended for reporting spam at all? Blogger Buzz doesn't carry comments, so I’ve sent them an email asking for further clarification:

Shouldn’t your primary concern be addressing the huge problem of BlogSpot spam blogs? The Blogger Help page on the Flag button discusses spam, but neither of the Blogger Buzz announcements have made any mention of it.

So, is the Flag button intended as a mechanism for reporting spam, or is it simply for dirty words? And if it is a spam-fighting tool, why not tell your users that it is? We'd be more than happy to help you out by flagging spam when we see it.


Categories: Spam

Thursday, August 18, 2005

More on BlogSpot's Flag button

BlogSpot’s new Flag button seems to be getting a lukewarm reception: the Blog Herald is sniffy about it (“a half-arsed effort”) but also appears to misunderstand the Blogger Buzz announcement, misinterpreting “a blog has to be republished for this new button to show up” as “it only applies to new blogs”. Sorry, but no, that’s not what “republished” means: Blogger republishes a blog when new content is added or when the blogger makes changes. My blogs predate the Flag button but, since I posted new content, carry it.

Weblogs, Inc.’s Unofficial Google Weblog picks up the Blog Herald report and runs with it, perpetuating the “only new blogs” misconception. A familiar pattern of repackaged, and underresearched, reporting.

The Blog Herald report does raise one interesting point: could the Flag button encourage denial-of-service attacks against individual blogs? For example, could a pro-choice blog be taken down by an organised anti-abortion email campaign of “visit this blog and flag it”? Or vice versa? Hopefully those reviewing the flag reports are level-headed enough to avoid this. Similarly, could spammers attempt to hide themselves amongst the noise by orchestrating mass flaggings of innocent blogs?

One other obvious thought struck me, though: once spammers realise that carrying the Blogger navbar on spam blogs increases their chances of being taken down, won’t most of them simply remove the navbar? Although this is against BlogSpot’s terms of service, it’s trivially easy to do: a quick Google search turns up many pages explaining how, including one ironically itself hosted at BlogSpot.

(Personally, I don’t mind the navbar; the search box is moderately useful, the Next Blog button sometimes fun for serendipitous surfing, and carrying the navbar is a small price to pay for free hosting, particularly as the alternative would probably be carrying advertising. I do however remove it when styling for print—try a Print Preview to see the result—as it doesn’t seem useful to either me or BlogSpot on the printed page.)

And one wild thought: I would assume the blog search engines—Technorati, Feedster, Blogpulse and the like—have developed algorithms for filtering spam blogs out of their results. Wouldn’t it be nice to close the loop and feed lists of identified spam blogs back to Blogger so they could act on them? Interestingly enough, Technorati is rumoured to be on the market, with Google often mentioned as a potential buyer. Hmmm.

[Update: the Blog Herald and Unofficial Google Weblog posts have been corrected.]

[Update: the Blog Herald reports that the black-hat community is already considering spam reporting and flagging schemes as gameable: “‘Bloggerbowling’: the practice of having robots robots flag multiple random blogs as splogs regardless of content to degrade the accuracy of the policing service.”]

Categories: Spam

Wednesday, August 17, 2005

Shell Ridge 4: Monday Night Nature Hike

Monday was the first time we’ve hiked in the evening. We usually hike late mornings and afternoons, and I think we’ve been missing out: there seems to be a lot more wildlife about at dusk.

We start at about 6pm at the Marshall Drive trailhead and head up onto Shell Ridge itself, taking the official trail this time: the last time I hiked this way I took an unofficial scramble up the end of the ridge. There’s not much to recommend this trail: it’s on the wrong side of the ridge to catch any breeze, making it a long hot uphill slog. Next time I’m taking the paved road straight past the EBMUD water tank.

First nature spot, though: Melinda stops me to flick a big red tick off my trouser leg. Ugh. I’m glad now I suggested long trousers rather than shorts; we follow the CCMVCD advice, tucking trousers into socks, shirts into trousers, and stopping every 15 minutes to check for ticks.

The trail eventually switchbacks its way onto the ridge, where the views are, again, spectacular. And at this time of day we can see the cloud and fog rolling in from the Bay and spilling like water over the Berkeley hills. Hawks circle on the breeze coming up the ridge, and something fat, furry, and with blurred wings buzzes us: not a hummingbird, but a hummingbird moth.

Out along the ridge, we circle the peak, coming back past my old friend NUECES 1946; still there. And just past the NGS marker, we see an ant migration: a procession of ants moving their entire colony from a hole on the trail to another hole some 12 feet away. Ants leave the original nest carrying white ant eggs and grass seeds; carry them to the new nest; and return emptyhanded for another load.

Trail map of Lime Ridge Open Space

We drop down the back of the ridge again, which is where we make the most interesting sightings: first a deer, picking its way down the side of the ridge. And then, five minutes later, a young coyote passes us, some 30 feet lower down the ridge, before climbing back up to rejoin the trail a way in front of us, where he nonchalantly leaves a scent mark before trotting off around the corner. My first coyote, and an amazing view of it. It’s smaller than I had imagined: like a large fox, rather than a small wolf, although without the bushy tail of a fox. We meet him again a few minutes later as he darts back around the corner and down the ridge to avoid a jogger coming the other way.

There’s good views from here of the Concord valley, which—thinking about it—we’ve hiked a good deal of: Shell Ridge to the south, Lime Ridge and Concord Open Space to the east, and the Contra Costa Canal Trail running through the middle of it.

From the Ridge Top Trail, we take the Costanoan Trail north, covering some of the same ground we did last time, before dropping back down on the Upper Buck Trail, past Deer Hill, and returning on the Corral Spring Trail. I’m keen to explore further on the Lower Buck Trail and the Deer Lake Trail, but Melinda’s flagging: we leave it for another day.

And suddenly, on the way back, the sunset starts; the hills behind us are painted in extraordinary oranges and pinks, and the streaks of cloud overhead glow in brilliant golds and oranges. Just amazing. We stopped at the trailhead—which has a handy 300ft elevation, enough to see over downtown Walnut Creek—to watch it fade.

A quick hike—about 3½ miles in about 2 hours—but a really good one: a lot to see. I’m thinking of returning and timing it carefully to hit sunset on the ridge with a picnic.

Categories: Hiking

Dear ABC…

…please stop skipping episodes of Lost in the summer rerun. You got me hooked on the first eight episodes, and now you’re leaving me dangling. Lost is all about the slow reveal; showing every other episode, as you've been doing for the past few weeks, ruins the continuity.

At least your episode guide is comprehensive; but y’know, I'd much rather watch the missing episodes than read about them…

Well this is interesting

It looks like Blogger’s finally starting to get more active in combating BlogSpot spam blogs. Look what’s just appeared on the navbar:

Fragment of Blogger navbar, showing new Flag button: “Notify Blogger about objectionable content. What does this mean?”

The “What does this mean?” link leads to a Blogger help page explaining the Flag button:

The “Flag?” button is a means by which readers of Blog*Spot can help inform us about potentially questionable content, so we can prevent others from encountering such material by setting particular blogs as “unlisted.” […]

For more serious cases, such as spam blogs or sites engaging in illegal activity, we will continue to enforce our existing policies (removing content and deleting accounts when necessary).
Self-policing moderation can work very well (in an earlier post I discussed its use on the Motley Fool discussion boards) but only if the moderators respond promptly to reported problems; it remains to be seen how actively Blogger will respond to flagged spam blogs.

Note also that, so far, it’s only appearing on blogs which have been updated since around midday today; less active or abandoned spam blogs currently still carry the old navbar. (For example, both the spam blogs I referenced as examples earlier lack the Flag button, although they’re still presumably happily generating PageRank for whatever they’re spamming.) Blogger could easily sidestep this by forcing a republish of all inactive blogs.

I can’t help wondering what finally made them take action? Suggestions that BlogSpot be excluded from blog search engines? Or maybe feedback from Blogger’s recent user survey? (My first answer to the “how could we improve Blogger?” question: “do something about spam blogs”.)

But whatever the reason: it’s nice to finally see something being done.

There doesn’t seem to be an official announcement yet, but Blogger’s Greg Stein tips the wink in this comment at intertwingly: “take a look at the navbar now…”

[Update: Blogger Buzz announcement.]

Categories: Spam

Sunday, August 14, 2005

The Twice-As-Good Rule

Microsoft recently launched MSN Virtual Earth, their response to the hugely popular Google Maps: the reaction from the blogosphere was a mixture of polite approval, yawns, and howls at outdated imagery and lacking international support. Why, as Robert Scoble put it, “the tidal wave of negative publicity”?

Because Virtual Earth is playing catchup to Google Maps. When you’re introducing a technology or product intending it to supplant existing competition, there’s one vital rule of thumb that applies:

If you want to be adopted enthusiastically, you’ve got to be twice as good as what’s gone before.

Put more bluntly: you’ve got to wow people.

Google came late to mapping, but they came in strong. Google Maps, with its huge, clear maps, its click-and-drag usability, and its responsiveness, was clearly hugely better than existing mapping websites. It wowed people; and they loved it.

Unfortunately for Microsoft, this means that Virtual Earth, to be received enthusiastically, needs to be twice as good as Google Maps. Well, it’s good; in some ways, it’s better than Google Maps. But it’s not twice as good; and that’s not good enough to wow people away.

Google are, of course, no strangers to wow. Google search wowed us when it was introduced; spare, fast, and amazingly good at bringing back relevant results, it was hugely better than Altavista, Lycos, and the other search engines of the time. Everyone else has been playing catch-up since, but nobody’s yet made the twice-as-good leap that would have them replacing Google. Looking like Google or acting like Google isn’t good enough; you need to be much better than Google.

Apple’s iPod was twice as good as the competition when it launched: sleeker, sexier, easier to use, hugely more capacious. But now new competitors have to be twice as good as the iPod; a difficult task.

CD is twice as good as cassette tape and vinyl, at least to most of us; now it’s all but supplanted them. But SACD isn’t twice as good as CD, so it remains a minority format. DVD is twice as good as VHS, and now videotape’s disappearing.

Windows 95 was twice as good as Windows 3.1: by dropping Program Manager, File Manager, and the clunky window-within-window metaphor, Windows 95 was the first in the Windows line that was actually easy to use. Have subsequent Windows versions been twice as good as their predecessors? Probably not—although Windows XP does finally seem to have made Windows stable. And this is the problem that Microsoft’s upcoming Windows Vista faces: to be received enthusiastically, it has to be seen as twice as good as Windows XP. Again, a difficult task.

And nobody gets particularly excited about PCs, because pretty much all PCs are pretty much the same. Despite Scoble’s evangelism, the Tablet PC format is not getting much buzz. Why? Because to most people, a Tablet PC doesn’t seem twice as good as a laptop.

Wednesday, August 10, 2005

TV technology, slippery reporting, and cultural bias: Promise TV

A story that did the rounds last month: Promise TV’s demo at OpenTech of a personal video recorder which, unlike current commercial product, records multiple channels simultaneously.

Some necessary background: the word multiplex has a very specific meaning in the digital TV arena. A multiplex is a collection of services—TV, radio, or data channels—grouped together into one massive stream of data, also known as a transport stream. A multiplex is broadcast on a single specific frequency on the distribution system, which may be terrestrial, cable, or satellite. This is very specifically different to analogue TV, in which a frequency carries only one service; digital TV compresses the data to squeeze multiple services onto each frequency. A network consists of one or more multiplexes; for example, the UK’s terrestrial Freeview network is made up of six multiplexes. And finally, the distribution system may itself carry multiple networks; for example, the same satellite transponder may used to carry programming from several different network providers.

What this means is that to show, or record, a digital TV service, you need to do two levels of filtering: first you tune to the relevant carrier frequency, which gets you a transport stream full of services; and then you fish out the service you want and discard the rest. It also means that if you want to show or record two services simultaneously, you may need to tune to two different frequencies simultaneously, depending on whether the two services are on the same or differing multiplexes. Current PVR/DVR boxes handle this by having two tuners.

The other thing that’s important to know is that a transport stream carries a large amount of data at a formidable data rate: 40Mbits/s is a fairly typical rate for a satellite multiplex. This is fast enough to fill a 160G hard drive in about eight hours. Current PVR/DVR boxes sidestep this torrent of data by recording services, not entire transport streams: a single service at under 5MBits/s is a lot more manageable. The cost of this, though, is that it makes recording selective: you have to tell the box what to record beforehand. Good programme information helps you to choose and schedule recordings, and some products (like TiVo) record programmes speculatively based on your previous habits. But still, the much-vaunted ability to pause and rewind live TV only applies to the channel you’re watching: if you’re watching ABC and realise that you’ve missed the start of the movie on NBC, you’re out of luck.

This is Promise TV’s premise: rather than record selectively, why not simply record everything and let you sort it out later? Their prototype is PC-based, with what appears to be three DVB-T (digital terrestrial) tuner cards, and a boatload of hard disks—the last making it a furiously expensive endeavor, although storage prices are always falling.

But here’s where the slippery reporting begins. Eyewitness reports, and Promise TV’s own recently-posted description, state that the prototype records twelve services from three of the six Freeview multiplexes. And Freeview, being free-to-air, carries a lot less programming than the UK’s pay-to-view cable and satellite services.

Cory Doctorow led off with a breathless report in BoingBoing:

What the Promise does is grab the entire broadcast TV multiplex—all the channels being broadcast in the UK—slices them up according to the free, over-the-air electronic programming guide, and stores an entire month’s worth. Why program a TiVo to get certain shows for you when you can record every single show on the air, all at once[?]
Whoa there: careful with that terminology. “Multiplex” has, as I explained above, a very specific meaning in the TV field; it certainly does not mean “all broadcast channels”. Grabbing a multiplex simply means you’re grabbing a collection of channels.

But that misunderstanding aside, there’s also some terribly imprecise reporting here. Firstly, the Promise demo clearly didn’t record “all the channels being broadcast in the UK”; it recorded a subset of the Freeview channels. And secondly: “all the channels being broadcast in the UK” is terribly vague in itself. What exactly constitutes “all channels”? Just as in the US: what channels you receive depends on what provider you subscribe to. And as I noted above, Freeview is itself a small subset of what’s available on the subscription providers.

But it’s too late to stuff the “records everything on UK television” meme back into the bottle: it makes too attractive a hook for other reporters. Daniel Terdiman at CNet seems to have picked up the BoingBoing report (although, to be fair, he did also talk directly to Promise TV):

When Cory Doctorow visited last weekend’s OpenTech conference in London, he was stunned to see a box about the size of a 1990-era VCR boasting some pretty forward-looking capabilities.

The box was a prototype of a digital video recorder from Ascot, England, start-up Promise TV that can record and index an entire week’s worth of British digital-television programming.
While this does accurately reduce the recording time from BoingBoing’s reported month to a week, note again the suggestion of “records all programming“ and the vagueness over exactly what “all programming” means: “digital television programming” covers a lot of different transmission methods and providers.

Ryan Block at Engadget picks up reporting directly from the CNet piece:

Not that we have a problem with a bigass 3.2TB DVR intended to basically intended to record an entire week’s worth of televised programming—is the case with Promise TV’s shortly forthcoming device they showed off at OpenTech [...]
Again, the suggestion is that it records “all programming”. Although Ryan does pick up on the issue of multiple tuners, he gets a little lost in the technology:

Second are the tuners: what, you going to rock a tuner dedicated for every channel?
No: you’d need (“to rock”?) a tuner dedicated to each multiplex; still a lot more tuners than current boxes, but not exponentially more. In the comments, Ryan is defensive about his second-hand reporting:

Maybe if I was at OpenTech, which I wasn’t, or if Promise published any information on their device, which they didn’t. I only have what I’ve got to work with, and that’s a couple crappy, information-light articles.
Maybe crappy, information-light articles don’t form a strong foundation for further reportage, hmm?

And then the reporting takes a stranger turn. Eric Hellweg at Technology Review described the demo as:

A prototype personal video recorder (PVR), called Promise TV, that successfully recorded and stored all the shows running for a week on all 12 channels in the UK.
“All 12 channels”? What on earth has happened here? My guess: the pervasive “records everything on UK TV” meme started by BoingBoing got conflated with some more accurate information (“12 channels”) on the prototype’s capabilities. Throw in some vaguely-remembered cultural bias—“oh, Britain, they don’t get much TV there do they?”—and we get the resulting nonsensical statement: the UK has only 12 TV channels.

My suspicion of cultural bias is strengthened by Eric’s closing sentence:

Then again, [Promise TV developer] Ludlam probably hasn’t experienced the literally hundreds of channels available in the United States—not all of them as must-see TV as, for instance, classic episodes of Monty Python’s Flying Circus.
Uh-huh, yep, that’s exactly what British TV is. Who needs hundreds of channels? All Monty Python, all the time, that’s us.

Well, no. While the UK still only has five analogue terrestrial channels, there’s no shortage of multichannel digital TV in the UK. And it’s arguably more advanced than the US. The UK adopted digital terrestrial very early (although with mixed success; the current free-to-air offering was built from the remains of a failed subscription service) and Sky were similarly aggressive in pushing towards digital satellite. You want hundreds of channels? We’ve got hundreds of channels.

The last link in the chain is this frankly bizarre report by Jen Seagrest at TV Squad, which directly links the Technology Review article:

Ever wish your DVR recorded more than two channels at once? [...] Evidentally the Brits want it too as I guess there is a dire need to record the Snooker matches on all four broadcast network channels at once.

Promise TV is a product in the making at the BBC labs in the UK. It will record every channel at once, not just the two that Tivo and other DVR’s can do presently.

To be fair the UK only has 12 channels total on thier satellite system. If they could get it to record 120 channels at once that would be getting somewhere. (Of course, if they could get more than 12 channels in the UK I’d move there in a second.)
As I said in the comments there: “pack your bags”. The UK gets way, way more than 12 channels on all its digital services.

The shift in focus to satellite is an odd invention; the Promise TV demonstration was clearly on digital terrestrial, although most of the subsequent reporting has just vaguely said “television” without specifying the distribution system.

But the cultural bias here is clear: Jen believes, and wants to believe, that UK TV is backwards. “All four broadcast channels” puts a subtle emphasis on “all”: oh, those poor wacky snooker-loving TV-deprived Brits. It seems churlish to mention that it’s actually been five channels for eight years now, or that the endless hours of snooker were always confined to BBC2. She accepts, and embellishes upon, the claim that UK TV has only 12 channels without challenging it with even the most cursory of research.

This is poor reporting; and a good example of why mainstream journalists criticise bloggers. TV Squad, as part of the Weblogs, Inc. portfolio, is positioned as a trade publication; its bloggers are paid; is it too much to expect at least some journalistic standards?

Saturday, August 06, 2005

Contra Costa Canal Trail 3: Pleasant Hill

Another day, another stretch of the Contra Costa Canal Trail. The weather today is blazingly hot, but an 8:30am start means we avoid most of the heat and most of the direct sun.

Start at Walden Park, just south of Pleasant Hill BART station; there’s a small parking lot here which was almost full when we arrived. It would seem we’re far from the only early walkers today. (Plenty of roadside parking on Jones Road, should this lot be full.) The Contra Costa Canal Trail runs east-west along the top edge of the park, crossing the Iron Horse Trail which runs north-south; although we drove to the start point today, we could have walked the 1.7 miles from home to here.

Satellite view of Walden Park and trails.

We head east on the Contra Costa Canal Trail, which crosses below the BART line, the freeway, and North Main Street before running a short while in an earth-banked cut. This section of the trail isn’t so nice; the sections near the freeway run past light industry and auto shops, and the cut is dull and bakingly hot.

But soon enough we turn north and head up through suburban Pleasant Hill. And this is a lot nicer. The canal, and so the trail, run along the backs of properties rather than alongside the roads; so it’s shady and quiet. Quiet, that is, apart from the other trail users, of which there are many. Saturday morning is obviously peak time.

Trail map of Contra Costa Canal Trail in Pleasant Hill.

At Lockwood Lane we turn around and head back, making this a roughly 5½ mile, 2 hour walk. And a very nice walk: cooler and more interesting than the arid stretch through Ygnacio Valley. I’m keen to hike the final stretch through northern Pleasant Hill.

Categories: Hiking

Thursday, August 04, 2005

“Without permission”

Dave Winer’s all in a flap over adverts appearing in one of the feeds he subscribes to. Well, I sympathise: I’m not keen on adverts in feeds either.

One of the secret joys of reading feeds, rather than webpages, is that you sidestep all the adverts. It was with a sinking heart that I read the Google AdSense for Feeds announcement, and with an inner cheer the later reports from beta users that they’re not finding such ads effective.

However, Dave’s latest post on the subject is bizarre:

It only seems fair to say that I unsubbed today, and that’s the last time you’ll hear about it here. He brought the ads back, without notice, without permission of the readers.
“Without permission”?

Since when does a publisher, of any type, need its readers’ permission to make changes? There’s no implied contract, when you subscribe to a feed, that the content will remain exactly to your liking.

As a reader, you have the power to vote with your feet—as Dave has done—by unsubscribing. You have the power to voice your opinion—as Dave has done—by commenting. But you do not, and should not, have the power to veto changes in what’s being published. That’s not your content to control.

As for Dave’s assertion that “advertising is so over”: you wish. Google AdSense, and the newly-in-beta Yahoo! Publisher Network, have a very clear goal: to let anyone, no matter how small, become a publisher of advertising. Google’s text ads are everywhere. Advertising’s not dead; it’s becoming more and more ubiquitous.

Ultimately, I suspect solutions to controlling overreaching advertising will be both social and technological. Remember what happened to popups? They were everywhere; readers complained, and complained, and complained; and then three things happened.

  1. Popup advertising started being less effective for advertisers, as readers became jaded and frustrated with them.
  2. Publishers started to reject popup advertising because of their negative effect on readers.
  3. Popup blocking software went mainstream, first as part of the Google Toolbar and then built into Internet Explorer.
Now we rarely see popups; and when we do, they’re a reliable indicator that we’re in a seedy backwater of the web.

I suspect the same will happen with other forms of advertising. When it becomes too much, readers will vote with their feet and stop visiting. And technologists will vote with their keyboards and start building adkilling tools.

This is already happening: the GreaseMonkeyUserScripts wiki lists Greasemonkey scripts for hiding AdSense adverts; for disabling IntelliTxt links; and a clutch of scripts for removing feed advertising from Bloglines.

Not that any of that helps Dave, of course; for despite wanting to be asked for permission before publication, he’s vehemently against content modification after publication.

Wednesday, August 03, 2005

Gaming the system: hidden ads and comment spam

There’s an interesting shift in spam on the web: Google and other search engines now have so much power that spam is increasingly being targeted at search engines, rather than at humans. Links are important in raising your position in Google’s rankings, so the more links you can throw out to yourself, the higher you go.

One twist on this that seems to be increasing recently: spam that’s visible only to search engines. CSS makes it relatively easy to include elements on a page which are made invisible to readers: one way to achieve this is to position the elements outside the page boundary.

A recent high-profile case was this story, about hidden articles on the Wordpress website:

These articles are designed specifically to game the Google Adwords program, written by a third-party about high-cost advertising keywords like asbestos, mesothelioma, insurance, debt consolidation, diabetes, and mortgages.
The twist in this scheme: hoist these hidden articles up the Google rankings by linking to them from the very-highly-ranked Wordpress home page. Arve Bersvendsen describes how:

The key here being the -9000px text indent: This makes the link invisible to human visitors with CSS, and visible to every search engine on the planet.
After a community outcry, the articles and the hidden links were removed.

More recently, The Republic of Geektronica discussed BlogSpot spam blogs:

A large percentage (maybe up to a third) of all Blogspot blogs are spam-logs—sites created to increase the Google ranking of some other site (which is itself usually a Google-spamming site). The ultimate purpose of these spamlogs is usually to drive traffic to a commission-paying pharmacy, pr0n, or casino site.
BlogSpot spam, despite Blogger’s protestations otherwise, appears endemic. In a quick spin through ten “next blog” clicks, I found two obvious spam blogs: leftists bunting, which seems to mix autogenerated text with spammy links; kaar028, which links from the article titles and stuffs the bodies full of keywords.

Geektronica continues:

Spammers are becoming less obvious by creating posts that link to actual news articles (complete with excerpts); by all appearances, these blogs are just like scores of real blogs. But if you look at the code of the page, there are tons of external spam links, cleverly hidden by CSS. […] With this additional layer of subterfuge, it’s remotely possible that someone will even link to [such a] blog from their highly-ranked site.

[Note: the original post links to an example of a blog using this trick, which has since been removed by Blogger.]
So, while CSS has been an enormous boon to the web, in allowing web designers enormous flexibility and expressiveness, it's also handed a valuable weapon to spammers: you can never be sure that what you see is the same as what a machine sees.

Earlier this week I spotted a new example. This comment, on Accordion Guy’s blog, looks innocuous enough. But take a look at the source:

Good...<div style="position: absolute; top: -1000px; left: -1000px; visibility: hidden;">The true fast way to enjoy and catch luck is Free online poker. <A href="http://online-poker-rooms.t35.com/z1.html"><strong><font size="+2">Hundreds fans come onto Free online poker constantly. </font></strong></A>. Invite your friends about Free online poker immediately and to get true real cash together. </div>
Yep: there’s spam there, safely tucked out of sight off the top-left of the page.

It would seem that Blogware doesn’t properly sanitize HTML in comments, allowing the style attribute through. A dangerous practice, given that comments come from outside the system and so should not be trusted. Mark Pilgrim talked about the dangers of untrusted HTML back in 2003; although he’s talking about HTML in RSS feeds, the points he raises and the suggestions he makes are just as valid for comments:

HTML is nasty. Arbitrary HTML can carry nasty payloads: scripts, ActiveX objects, remote image web bugs, and arbitrary CSS styles that [...] can take over the entire screen.

Still, dealing with arbitrary HTML is not impossible. [...] I offer this advice:

  • Strip script tags. This almost goes without saying. [...]
  • Strip embed tags.
  • Strip object tags.
  • Strip frameset tags.
  • Strip frame tags.
  • Strip iframe tags.
  • Strip meta tags, which can be used to hijack a page and redirect it to a remote URL.
  • Strip link tags, which can be used to import additional style definitions.
  • Strip style tags, for the same reason.
  • Strip style attributes from every single remaining tag. [...]
Alternatively, you can simply strip all but a known subset of tags. Many comment systems work this way. You’ll still need to strip style attributes though, even from the known good tags.
A quick play around with comment previewing suggests that Blogger does quite well on these, disallowing all the tags above and more, although it’s not clear if it disallows the style attribute completely or whether it simply disallows or allows specific style properties. Blogware does quite poorly, rejecting the <script> tag but appearing to allow everything else. Unless Blogware performs more stringent validation or stripping on submit than it does on preview, it’s handing malicious commenters quite an arsenal to work with.

Categories: Spam

Javascript citations, round 3

A minor but necessary tweak to the citation code discussed earlier: escape the URL you’re searching for when you form Bloglines and Technorati search URLs.

Why? Because Radio-based blogs—like Scobleizer and Scripting News—use permalink URLs including fragment identifiers. For example, the most recent post on Scoblelizer has a permalink URL of:

http://radio.weblogs.com/0001011/2005/07/31.html#a10804

This permalink identifies a particular post on the blog’s 31/07/2005 archive page; a browser resolves it by requesting the URL that precedes the # and searching the returned page for the fragment identifier that follows the #.

So, if I form a Technorati search URL by naive concatenation, as I had been doing, you get:

http://www.technorati.com/search/http://radio.weblogs.com/0001011/2005/07/31.html#a10804 (no links at time of writing)

Close, but this isn’t actually searching for what I want it to search for. What’s actually happening here is a Technorati search for all citations of the 31/07/05 archive page, followed by an unsuccessful in-browser search of the resulting page for the a10804 fragment identifier.

What I need to do is have Technorati search for the entire Scobleizer URL, including the fragment; the way to do this is to escape() the search URL before concatenation, which encodes the # and hides it from the browser’s special treatment:

http://www.technorati.com/search/http%3A//radio.weblogs.com/0001011/2005/07/31.html%23a10804 (6 links at time of writing)

That’s better. So, oops; my mistake. And the lesson: beware quick and dirty hacks.

Tuesday, August 02, 2005

“I link to things I like”

The Technorati Top 100 List again came under fire as an “good old boys network” at last weekend’s BlogHer conference. Robert Scoble discusses and defends it:

How do you change this? I have some ideas. But, they require you to put in the work. I blog every day from 6 p.m. to 2 a.m. and on weekends. And that’s after putting in a day’s work doing a video blog for Microsoft and answering email and doing a bunch of networking.

If you’re willing to put in the work day after day after day for five years you’ll find yourself in the good old boys network too.
Scoble: hardest working man in the blog business. You’re not getting on that list, gals, because you’re just not working hard enough.

He treads similar ground in his response to Renee Blodget’s suggestions for a female speakers list:

Renee, we already have that list. It’s called Google (or MSN or Yahoo, they all pretty much work similar).

[…]

Here’s a hint: you can get on those lists. Just blog and blog well.

So the real trick isn’t to make some sort of new list. It’s to teach people how search engines work and how to get other people to notice that they have expertise in a certain area.
That’s right, gals: you’re not getting on the list because you don’t understand how Google works.

*sigh*

Nothing’s changed since Shelley Powers wrote Guys Don’t Link, has it? Women aren’t missing from the Top 100 because they’re not working hard, or because they’re not working the system, or even because—as these statements seem to be carefully avoiding saying directly—they’re simply not good enough. They’re missing from the Top 100 because the good old boys aren’t linking to them.

As if to prove his critics wrong, Robert threw out links to three women on Sunday. That’s nice. But it’d be nicer if one of them—Dori Smith, of whom Robert says “I’m permanently in her debt”—hadn’t previously had sand kicked in her face:

I see that Dori Smith is insisting that she’s invisible again. I don’t get that. Dori, have you ever thought that we don’t link to you because you’re talking about Diet drinks and things to do in California’s wine country instead of geeky stuff?
Dori’s response to that was restrained, but angry:

Okay, let’s do a count of posts that’ll be on this page after this goes up:
  • Posts by Tom about diet drinks: 1
  • Posts by Dori about Healdsburg: 1
  • Posts by Dori about geeky topics, or stuff that at least started out as geeky topics: 9
So, what can we take from this? Robert noticed only two posts: one that I didn’t write, and one (out of ten) that I did write that was on a non-geeky topic. And while he disagrees about my perception that I’m invisible, I think that he just did an excellent job of proving my point.
Zing. There’s more good stuff in her comments, too. But back to Robert’s original post, which included something I found boggling:

I totally disagree that a link doesn’t mean something. When I link to something I KNOW I’m voting for it. So, I don’t link to things I don’t want to go up the search engines. I thought about using the “no follow” attribute, but to be honest, even a nofollow link is a vote. Such a link still sends traffic and since some of my friends are making more than $10,000 a month on Google ads such a link is a very real increase in income.

So, I link to things I like. You should do the same.
I disagree. Linking only to things you like doesn’t leave much room for criticism: for what’s the point in talking critically about something without linking to it? Linkless criticism is annoying to readers (“what’s he talking about?”) and unhelpful in forming a wider conversation (to search engines, the link is the connection between your commentary and the piece you’re criticising).

Only linking to things you like risks leading to only talking about things you like. Which surely isn’t a good thing; being nice all the time might cement your popularity amongst the blogger circle jerk, but it doesn’t quite ring true. Real people don’t gush all the time.

Nick Nichols, writing in the comments to that post, says:

A link is a link. It’s not a vote. Indeed, if someone is making an ass of themselves, or expressing general stupidity, the best thing you can do is to give it (and the person) exposure so no one is later fooled about that person if he sounds sane at the moment. If it’s something you disagree with, then exposing what you disagree with makes your position even clearer.
I agree. A link isn’t an approval of what it points to; it’s an exposure of it. A link, by itself, doesn’t say “this is good” or “this is bad”; a link says only “this is significant”. It’s the commentary surrounding the link which expresses why it’s significant.

But there’s a more insidious angle to this, which ties it neatly back to Guys Don’t Link. “Things we like” are likely to be written by “people like us”. Not linking outside your comfort zone may mean you’re not linking outside your socioeconomic peer group. And so the circle closes around the good old boys.

And finally, Robert, if you’re worrying more about the search engine rating, traffic, and advertising dollar impact of your links than about what you have to say about what you’re linking to: haven’t you rather lost touch with your “blogging as conversation” beliefs? This is blogging as power; link as big swinging weapon. And isn’t that exactly what Shelley was talking about?