posts - 81, comments - 262, trackbacks - 0

Tuesday, August 28, 2012

MediaWiki Short URLs with IIS 7 and Subpages

Following the guide on mediawiki.org got me 99% there with nice short URLs on Windows with IIS 7.

One small hiccup, creating subpages of the form

wiki/PageTitle/SubPageTitle

The suggested Regular Expression provided by URL Rewrite needs a simple update.

<match url="^([^/]+)/?$" />

Should be

<match url="^([^/]+.+?)/*$" />

Or, as is done in the guide, with the wiki path

<match url="^(wiki/[^/]+.+?)/*$" />

Understanding the change requires reading the original regular expression.

  • "^" Match from the start of the string
  • "[^/]" Here "^" has a special meaning, it negates "/" so that this matches any character but "/"
  • "[^/]+" The plus means to match at least once or more
  • "([^/]+)" The parenthesis are simply a grouping, so that what's matched inside is the available as the 'first match'. Put another way this selects the output from the pattern match. "{R:1}" is actually this value being used.
  • "/?" Matches none to one "/" non-greedily (more on that).
  • "$" Matches the end of the string

We would like it to match a title with "/" within it. MediaWiki will handle the details of splitting the title into subpages. We are already matching anything but "/", so we might as well match anything. In RegEx, a "." means just about anything (anything but a new line). We still don't want to start a title with "/", so after the first non "/" matches, lets accept anything up until the end of the string. We do that by simply inserting ".+", matching one or more of anything but a new line.

As you might have noticed, the new patterns have two other changes, an additional "?" and replacing the last "?" with "*" at the end of the pattern. Why gets into something called greedy and non-greedy matching. Consider the following path

wiki/PageTitle/SubPageTitle/

Where should the trailing "/" go? Should it be part of the title or matched as an optional end of the path, but not part of the title? The title should be "PageTitle/SubPageTitle" and not "PageTitle/SubPageTitle/", but the title matching, "([^/]+.+)" and the trailing "/" matching "/?" both may match the ending "/".

What we do is add a "?" just after the "+" to match non-greedily. When this is done, the pattern will match as short of a pattern as possible. But, we now need the trailing "/" to match greedily. We can do this easily by changing the "?" to "*" which will greedily matching zero or more trailing "/".

And there you have it, subpage support.

posted @ Tuesday, August 28, 2012 4:21 AM | Feedback (1) |

Powered by:
Powered By Subtext Powered By ASP.NET