| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

Whistle

This version was saved 13 years, 8 months ago View current version     Page history
Saved by Tantek
on July 24, 2010 at 5:13:51 pm
 

Whistle personal URL shortener

 

Whistle is an algorithmically reversible personal URL shortener.

 

There is an instance of Whistle running at http://ttk.me.

 

building blocks

Whistle makes use of the following building blocks:

 

clients

Whistle is used by:

 

design requirements

summary of design requirements for Whistle

 

  • personal URL shortener
  • personal short URL domain
  • algorithmic shortening (no database/opaque id indirection)
  • human readability / print safety (ShortURLPrintExample)
  • as short as possible (NewBase60 compression)
  • per content-type URL space partitioning (one character content-types)

 

simple example

Here is a sample actual long URL:

 tantek.com/2010/034/t2/diso-2-personal-domains-shortener-hatom-push-relmeauth

The slug is purely for display and search, the unique portion of the URL is actually:

tantek.com/2010/034/t2/

this compresses to the following Whistle short URL:

 

 

why

Why short URLs, why is it necessary to run your own URL shortener, why the Whistle design, and how does it work?

 

why short URLs

Historically the need for "short" URLs is not new. But what we mean by "short" has certainly changed a lot recently.

 

Old email systems would wrap text at 80 characters (or a few less) and make it just harder enough to reliably reconstruct or use URLs that many systems adopted a common practice of keeping URLs to 70 characters or less by design.

 

This was fundamentally usability driven.

 

Shorter URLs in email are easier to use and more reliable. They're nicer in IM too.  And browser screenshots. I've retyped URLs from screenshots in slides, I'm sure many of you have too.

 

How about print? Ever typed in a URL from a book? Or advertising. Magazine spreads or billboards - URLs are ubiquitous.  See ShortURLPrintExample for actual documented real-world examples of short URLs in print.

 

The easier to read and type-in, the more folks visit the URL.

 

But again, this is nothing new. Ever since the dotcom boom URLs have become a part of our visual language (much to the chagrin of linguists I'm sure).

 

why do you need your own

Why do you need your own URL shortener and short domain?

 

Twitter rewrote our brains to think in 140 characters and suddenly every one of those characters counted.

 

And two things happened:

  1. URL shortener services showed up which would trim any URL down to a small handful of characters, saving your precious tweetspace for your own words. Everyone started using them. Twitter and clients started auto-shortening URLs.
  2. We started to understand just how fragile these shorteners are, and how they break the web. How many shortener sites have died taking their links with them to the bit bucket? Even tr.im, which is keeping the lights on longer than others, is set to shut down at the end of 2010. It was frustration with tr.im's downtimes and then end-of-service announcement that led me to this realization: It's not good enough to have your own URL; You need to have your own shortener as well. 

 

This isn't just for independents. Companies and hosting services should have their own too. The first big site to realize and do this right was Flickr, and many have followed.

 

The key here is that when you own and host your own shortener for links to your content, you're not adding any more fragility to web. If your shortener goes down, your site probably is down as well. They're tied together. No additional risk. Unless you use a database for the shortenings and you lose your database because you were unwilling (or unable) to pay the DBA tax to maintain it. We'll talk more about the DBA tax problem in due time.

 

But why is it important to own the shortened links to your content? Why not just always share your full "long" URLs?

 

In short:

  1. You can't always do so. E.g. Twitter now auto-shortens many URLs.
  2. Shorter URLs tend to be better for sharing (for all the reasons discussed at the top). 

 

And that #2 is where we get to DiSo.  A couple of the key architectural components of DiSo 2.0 are:

  1. Publish on your own site, own your URLs, your permalinks, and
  2. Syndicate out to other sites. Your text updates to Twitter, your checkins to Foursquare, your photos to Flickr etc.

 

The direction of the content flow is very important here, as it has to do with ownership, and what's the original vs. what's just a copy.

 

It's ok to sharecrop copies, especially when the copies link back to your original. That's called distribution.

 

It's not ok to sharecrop the original and aggregate copies on your own site. You're still sharecropping and you're still beholden/vulnerable to those 3rd party sites going down, censoring your content, renaming you, or being blocked by some nationwide internet filtering firewall.

 

In a DiSo solution, when you syndicate your content out to other sites, the key is that those syndicated copies of your content link back to the original. Permalinks serve this role for blog posts. For short text updates that you syndicate to Twitter or Identi.ca etc., you need perma-short-links. And that's where your own shortener is essential

 

why an algorithmic URL shortener

One of the key emphases of the DiSo 2.0 I've outlined is maintainability. Fewer moving parts, fewer magic hidden files, fewer things that can inexplicably fail = more independents succesfully running and owning their own sites, identities, web presences over time.

 

Nearly all (maybe all?) open source URL shorteners today use a database to store the pairs of "short code" and "actual URL". If you lose that database, forget to back it up, have some bad database code that corrupts it etc., your shortlinks are gone, dead, useless.

 

If instead you create and use a URL shortener to create shortlinks that are algorithmically reversible, and then document that algorithm, publicly, then anyone can figure out how to expand your shortlinks. If they happen upon them on some random site, they can expand them and look for the original, or at least know that you're linking to the same thing that a normal permalink somewhere else is expressing.

 

In addition, all manner of browser or aggregator tools and sites that currently have to manually resolve shortlinks by calling the APIs of their services can save the bandwidth and time and simply decode your URLs themselves.

 

Once again, Flickr set a very good example with their http://flic.kr/ shortener for Flickr photos.

 

In fact, their doing so inspired me within days to grab http://ttk.me/ and set it up to redirect to my site http://tantek.com/, knowing I would eventually (as I have) add various shortening services to it.

 

Similarly I encourage every independent out there, everyone who wants to install and/or run their own DiSo implementation (like Falcon), to go ahead and not just grab a domain name for themselves, but also grab a shortener domain too. Set it up to redirect to your primary domain for now.

 

why one letter content type codes

Two more things that Kellan got right in the Flickr shortener which I've also found inspiration in:

 

  1. just a "/p/" to indicate "Photo" presumably (clever idea to prefix like that to allow for other prefixes to do other things)
  2. and then a Base58 compressed photo id. 

 

Regarding 1, I've also settled on one-character "spaces" for different types of URLs. "p" for photo makes sense to re-use. After quite a bit of personal research into what types of content are different enough and used often enough to warrant their own short URL spaces, I've come up with about 20 different content types, each with their own letter.

 

Here are a few examples from my content-type short codes:

 

  • b - blog post, article (structured, with headings), essay
  • i - identifier - on another system using subdirectories as system id spaces
    • i/i/ - compressed ISBN numbers
    • i/a/ - compressed ASIN numbers
  • p - photo
  • t - text, (plain) text, tweet, thought, note, unstructured, untitled 

 

 

These and the rest are documented more fully in the "design" section below.

 

 

how do the t short URLs work

Specifically for text notes, I decided to keep my "t" shortener as short as possible, which meant dropping a trailing "/".

 

After that I use a 3 digit sexagesimal (Base60) number to represent the date in a manner deliberately limited to human individuals. Why Base60? Lots of reasons, including print-safety (as mentioned above). Want to read the entire derivation and reasons why? See NewBase60 (includes open source CASSIS implementation).

 

Why 3 sexagesimal digits to represent the date? It turns out that 3 sexagesimal digits are capable of representing over 500 years of days - plenty overengineered for any human lifetime. And if anyone does figure out how to live more than 500 years I have a feeling that person will not only not resemble human as we know it very much, but will either have bigger problems to deal with than URL shortener limitations, or will be so smart that they will come up with a better solution.

 

But for now, for our feeble less than 200 year lifetimes, this is good enough. In addition we can even agree on a day zero that computes well with existing platforms. Unix Epoch start: 1970-01-01. Given that no-one published anything to the web before 1990, I think we're ok with that. What happens in a few hundred years? Perhaps people can pick their own day zeroes as they see fit.

 

Thus the 3 characters after the "t" represent the number of days since 1970-01-01 in sexagesimal - what I'm calling "epoch days".

 

Finally I allow for 1 (or 2, but haven't needed it yet) more sexagesimal digit to indicate the nth ordinal post of that type for that day. Thus:

 

ttk.me/tSSSn

 

  • SSS = sexagesimal epoch days
  • n = nth post that day

 

This is sufficient to expand to:

 

tantek.com/YYYY/DDD/tn/

 

 

Which I then redirect server-side to a longer URL with post keywords (AKA "slug") on the end. E.g.

 

ttk.me/t4432 is

 

  • t - text note
  • 443 - 443(base60)th epoch day = 2010, the 34th day of
  • 2 - 2nd text note that day

 

thus expands to:

 

tantek.com/2010/034/t2/

 

which is enough for Falcon to retrieve the post in the hAtom store, where it also gets the keyword/slug phrase for the post, and uses it to redirect it to: 

 

tantek.com/2010/034/t2/diso-2-personal-domains-shortener-hatom-push-relmeauth 

 

design

Design notes:

  • single-letter content-type prefix
    • a - audio recording, speech, talk, session, sound 
    • b - blog post, article (structured, with headings), essay
    • c - code, sample code, library, open source, code example
    • d - diff, edit, change
    • e - event - hCalendar
    • f - favorited - primarily just a URL, often to someone else's content. for more, see 'r' below 
    • g - geolocation, location, checkin, venue checkin, dodgeball, foursquare
    • h - hyperlink - e(x)ternal reference, link, etc. use of short URL to link to things that I expect to die or move, untrustworthy permalinks. 
    • i - identifier - on another system using subdirectory as system id space
    • j - reserved
    • k - reserved
    • l . (skipping due to resemblance to 1, per print-safety design principle, related: ShortURLPrintExample)
    • m - (message like email, permalink to external list archive, or private blog archive, or a sender-hosted message)
    • n - reserved
    • o - physical objects (e.g. stuff from Amazon, or URLs attached to actual specific physical objects) 
    • p - photo (re-using Flickr's design choice of flic.kr/p/ for photo short URLs)
    • q - reserved
    • r - review, recommendation, comment regarding/response/rebuttal - hReview/xfolk
    • s - slides, session presentation, S5 
    • t - text, (plain) text, tweet, thought, note, unstructured, untitled 
    • u - (update, could be used for status updates of various types)
    • v - video recording 
    • w - work, work in progress, wiki, project, draft, task list, to-do, do, gtd
    • x - XMDP Profile 
    • y - reserved
    • z - reserved 

 

  • t - text note specific short URL design: /tSSSn
    • SSS - NewBase60 epoch days
    • n - nth post for the day

 

implementation

Whistle has an implementation of the following:

  • single-letter content-types (on ttk.me for tantek.com)
    • i - identifier - on another system using subdirectory as system id space
    • t - text, (plain) text, tweet, thought, note, unstructured, untitled
    • w - work, work in progress, wiki, project, draft, task list, to-do, do, gtd
      • for now the 'w/' short URLs simply redirect to this wiki, eventually they'll redirect to wiki pages on tantek.com, likely hosted/published by Falcon
  • single-letter content-types (on ufs.cc for microformats.org)

 

 

interviews

Interview by Steve Ivy published on monkinetic:

 

FAQ

Why not use days since you were born

Q: Why not use days since you were born instead of days since epoch start (1970-01-01) ?

A: In short: 1. easier debugging, 2. birthday privacy. First, from a practical perspective, reusing epoch start makes it easier to debug: 0 datestamp means 0 epoch time, everyone's personal permalinks share the same NewBase60 datestamps etc.  And second, using your birthday as your 0-day for permalink datestamps would have the side-effect of publishing your precise year/month/day of your birthday which not everyone may want to do - in fact, typically people still keep their full birthday private rather than publishing it openly on the web. Long term if this encoding scheme is still used in say 200+ years, it may make sense to pick a new day zero for folks born after a certain point in time (e.g. perhaps 2200-001 for everyone born on that day or later.).

 


Return to MyNextStartup \ FrontPage.

Comments (0)

You don't have permission to comment on this page.