Whistle


Whistle personal URL shortener


 

Whistle is an open source, algorithmically reversible, personal URL shortener.

There is an instance of Whistle running at http://ttk.me.

 

Note: if you're looking for an open source, database-based, generic (any) URL shortener, see other projects like ur1.ca (note: that's U R ONE dot C A), yourls.org, etc. feel free to suggest others.

 

building blocks

Whistle makes use of the following building blocks:

 

clients

Whistle is used by:

 

design requirements

summary of design requirements for Whistle

 

 

simple example

Here is a sample actual permalink URL and how it's converted to a permashortlink:

 tantek.com/2010/034/t2/diso-2-personal-domains-shortener-hatom-push-relmeauth

The slug is purely for display and search, the unique portion of the URL is actually:

 tantek.com/2010/034/t2

this compresses to the following Whistle short URL:

 ttk.me/t4432

via /t2 = 't' text note number '2' for the day, and

/2010/034 == 2010-034 ordinal ISO8601 date == '443' in sexagesimal NewBase60 epoch days, and

ttk.me is the short domain for tantek.com.

This algorithm is reversible and thus inverting it resolves the short URL into the original permalink.

 

why

Why short URLs, why is it necessary to run your own URL shortener, why the Whistle design, and how does it work?

 

why short URLs

Historically the need for "short" URLs is not new. But what we mean by "short" has certainly changed a lot recently.

 

Old email systems would wrap text at 80 characters (or a few less) and make it just harder enough to reliably reconstruct or use URLs that many systems adopted a common practice of keeping URLs to 70 characters or less by design.

 

This was fundamentally usability driven.

 

Shorter URLs in email are easier to use and more reliable. They're nicer in IM too.  And browser screenshots. I've retyped URLs from screenshots in slides, I'm sure many of you have too.

 

How about print? Ever typed in a URL from a book? Or advertising. Magazine spreads or billboards - URLs are ubiquitous.  See ShortURLPrintExample for actual documented real-world examples of short URLs in print.

 

The easier to read and type-in, the more folks visit the URL.

 

But again, this is nothing new. Ever since the dotcom boom URLs have become a part of our visual language (much to the chagrin of linguists I'm sure).

 

why do you need your own

Why do you need your own URL shortener and short domain?

 

Twitter rewrote our brains to think in 140 characters and suddenly every one of those characters counted.

 

And two things happened:

  1. URL shortener services showed up which would trim any URL down to a small handful of characters, saving your precious tweetspace for your own words. Everyone started using them. Twitter and clients started auto-shortening URLs.
  2. We started to understand just how fragile these shorteners are, and how they break the web. How many shortener sites have died taking their links with them to the bit bucket? Even tr.im, which is keeping the lights on longer than others, is set to shut down at the end of 2010. It was frustration with tr.im's downtimes and then end-of-service announcement that led me to this realization: It's not good enough to have your own URL; You need to have your own shortener as well. 

 

This isn't just for independents. Companies and hosting services should have their own too. The first big site to realize and do this right was Flickr, and many have followed.

 

The key here is that when you own and host your own shortener for links to your content, you're not adding any more fragility to web. If your shortener goes down, your site probably is down as well. They're tied together. No additional risk. Unless you use a database for the shortenings and you lose your database because you were unwilling (or unable) to pay the DBA tax to maintain it. We'll talk more about the DBA tax problem in due time.

 

But why is it important to own the shortened links to your content? Why not just always share your full "long" URLs?

 

In short:

  1. You can't always do so. E.g. Twitter now auto-shortens many URLs.
  2. Shorter URLs tend to be better for sharing (for all the reasons discussed at the top). 

 

And that #2 is where we get to DiSo.  A couple of the key architectural components of DiSo 2.0 are:

  1. Publish on your own site, own your URLs, your permalinks, and
  2. Syndicate out to other sites. Your text updates to Twitter, your checkins to Foursquare, your photos to Flickr etc.

 

The direction of the content flow is very important here, as it has to do with ownership, and what's the original vs. what's just a copy.

 

It's ok to sharecrop copies, especially when the copies link back to your original. That's called distribution.

 

It's not ok to sharecrop the original and aggregate copies on your own site. You're still sharecropping and you're still beholden/vulnerable to those 3rd party sites going down, censoring your content, renaming you, or being blocked by some nationwide internet filtering firewall.

 

In a DiSo solution, when you syndicate your content out to other sites, the key is that those syndicated copies of your content link back to the original. Permalinks serve this role for blog posts. For short text updates that you syndicate to Twitter or Identi.ca etc., you need perma-short-links. And that's where your own shortener is essential

 

why an algorithmic URL shortener

One of the key emphases of the DiSo 2.0 I've outlined is maintainability. Fewer moving parts, fewer magic hidden files, fewer things that can inexplicably fail = more independents succesfully running and owning their own sites, identities, web presences over time.

 

Nearly all (maybe all?) open source URL shorteners today use a database to store the pairs of "short code" and "actual URL". If you lose that database, forget to back it up, have some bad database code that corrupts it etc., your shortlinks are gone, dead, useless.

 

If instead you create and use a URL shortener to create shortlinks that are algorithmically reversible, and then document that algorithm, publicly, then anyone can figure out how to expand your shortlinks. If they happen upon them on some random site, they can expand them and look for the original, or at least know that you're linking to the same thing that a normal permalink somewhere else is expressing.

 

In addition, all manner of browser or aggregator tools and sites that currently have to manually resolve shortlinks by calling the APIs of their services can save the bandwidth and time and simply decode your URLs themselves.

 

Once again, Flickr set a very good example with their http://flic.kr/ shortener for Flickr photos.

 

In fact, their doing so inspired me within days to grab http://ttk.me/ and set it up to redirect to my site http://tantek.com/, knowing I would eventually (as I have) add various shortening services to it.

 

Similarly I encourage every independent out there, everyone who wants to install and/or run their own DiSo implementation (like Falcon), to go ahead and not just grab a domain name for themselves, but also grab a shortener domain too. Set it up to redirect to your primary domain for now.

 

why human readability

Short URLs are used in contexts where humans end up reading them and typing them in by hand. Some examples:

 

why one letter content type codes

Two more things that Kellan got right in the Flickr shortener which I've also found inspiration in:

 

  1. just a "/p/" to indicate "Photo" presumably (clever idea to prefix like that to allow for other prefixes to do other things)
  2. and then a Base58 compressed photo id. 

 

Regarding 1, I've also settled on one-character "spaces" for different types of URLs. "p" for photo makes sense to re-use. After quite a bit of personal research into what types of content are different enough and used often enough to warrant their own short URL spaces, I've come up with about 20 different content types, each with their own letter.

 

Here are a few examples from my content-type short codes:

 

 

 

These and the rest are documented more fully in the "design" section below.

 

 

how do the t short URLs work

Specifically for text notes, I decided to keep my "t" shortener as short as possible, which meant dropping a trailing "/".

 

After that I use a 3 digit sexagesimal (Base60) number to represent the date in a manner deliberately limited to human individuals. Why Base60? Lots of reasons, including print-safety (as mentioned above). Want to read the entire derivation and reasons why? See NewBase60 (includes open source CASSIS implementation).

 

Why 3 sexagesimal digits to represent the date? It turns out that 3 sexagesimal digits are capable of representing over 500 years of days - plenty overengineered for any human lifetime. And if anyone does figure out how to live more than 500 years I have a feeling that person will not only not resemble human as we know it very much, but will either have bigger problems to deal with than URL shortener limitations, or will be so smart that they will come up with a better solution.

 

But for now, for our feeble less than 200 year lifetimes, this is good enough. In addition we can even agree on a day zero that computes well with existing platforms. Unix Epoch start: 1970-01-01. Given that no-one published anything to the web before 1990, I think we're ok with that. What happens in a few hundred years? Perhaps people can pick their own day zeroes as they see fit.

 

Thus the 3 characters after the "t" represent the number of days since 1970-01-01 in sexagesimal - what I'm calling "epoch days".

 

Finally I allow for 1 (or 2, but haven't needed it yet) more sexagesimal digit to indicate the nth ordinal post of that type for that day. Thus:

 

ttk.me/tSSSn

 

 

This is sufficient to expand to:

 

tantek.com/YYYY/DDD/tn/

 

 

Which I then redirect server-side to a longer URL with post keywords (AKA "slug") on the end. E.g.

 

ttk.me/t4432 is

 

 

thus expands to:

 

tantek.com/2010/034/t2/

 

which is enough for Falcon to retrieve the post in the hAtom store, where it also gets the keyword/slug phrase for the post, and uses it to redirect it to: 

 

tantek.com/2010/034/t2/diso-2-personal-domains-shortener-hatom-push-relmeauth 

 

design

Design notes:

 

 

Replies, responses, rebuttals used to be part of 'r', but in practice are more of an aspect of a post rather than type of post in and of themselves. A 't'ext note can be a reply, and a 'b'log post can also be a reply. The "replyness" of a post is determined by the presence of a link with in-reply-to markup

 

I've been posting RSVPs in practice as 't' notes with both an in-reply-to link to an event post, and an explicit p-rsvp property. The presence of the p-rsvp property is sufficient to determine that it is an RSVP post, or subtype of text note.

 

under consideration

 

design related analysis

Others single-letter post type schemas / short URL designs:

 

implementation

Whistle has an implementation of the following:

 

interviews

Interview by Steve Ivy published on monkinetic:

 

FAQ

Why not use days since you were born

Q: Why not use days since you were born instead of days since epoch start (1970-01-01) ?

A: In short: 1. easier debugging, 2. birthday privacy. First, from a practical perspective, reusing epoch start makes it easier to debug: 0 datestamp means 0 epoch time, everyone's personal permalinks share the same NewBase60 datestamps etc.  And second, using your birthday as your 0-day for permalink datestamps would have the side-effect of publishing your precise year/month/day of your birthday which not everyone may want to do - in fact, typically people still keep their full birthday private rather than publishing it openly on the web. Long term if this encoding scheme is still used in say 200+ years, it may make sense to pick a new day zero for folks born after a certain point in time (e.g. perhaps 2200-001 for everyone born on that day or later.).

 

Why not just use the short URLs all the time?

Q: Why redirect? Why not just use the short URLs all the time?

A: In short: 

1. Aesthetic friendliness. Compressed characters (even NewBase60) look like line noise errors to most people.

2. Branding. "tantek.com" (or whatever your own personally recognizable domain) has value purely by name.

3. SEO. Using a long URL with a title/keyword slug as the final segment helps with better search result placement.

 

Why use short ids instead of URLs in syndicated notes?

Q: Why use permashortids like (ttk.me t4MY1) in syndicated tweets instead of URLs like http://ttk.me/t4MY1?

A: In Twitter (and other short messaging systems), there's a strong user expectation that any link will provide more content than the tweet (short text note) itself. When a link merely shows the user what they've already seen (even if it is a nicer UI, the original posting URL etc.), they tend to get upset (went through an actual iteration of this, with plenty of feedback from colleagues - some even publicly posted ;). So now I only provide a full permashortlink URL at the end of my syndicated text notes if there is more content at the original, e.g. photos, videos, expansion of elided text, etc.

 

Why use space as a delimiter inside permashortids?

Q: Why use space instead of punctuation as a delimiter inside permashortids?

A: I actually started with using a slash "/" as a delimiter because it made the permashortids URL-like (recognizable) without making them clickable in Twitter - for a while until they changed their auto-linker to link them up (which defeated the purpose of making them short ids (see previous FAQ).

 

Once back to the drawing board, I reanalyzed the problem and realized that the permashortid at the end of my tweets is essentially a shorthand citation to the original work. Thus I researched existing shorthand citation conventions / formats and re-used accordingly.

Both Harvard and The Chicago Manual of Style (click the "AUTHOR-DATE" tab to see examples) use the format:

(author date)

[for more on citation formats/styles, I've collected some research here: http://microformats.org/wiki/citation-formats#styles, naturally :) ]

Thus for my posts the analogy to

 

(author date)

 

is

 

(shortdomain datedID)

 

e.g.

 

(ttk.me t4MY1)

 

where


Also from a readability design perspective, even if you didn't have the prior art, alternatives like more punctuation (a ":" or an "=") are noisier (among the text) and uglier too (for humans). Punctuation should only be used when doing so is an improvement over not. Since the parantheses are already containing the citation information, a space is a reasonable delimiter - no colon or equals or other symbols necessary.

 

 

Related Projects

 


Return to MyNextStartup \ FrontPage.