L

Plaintext Style Markup

April 23rd 2009 02:07:39

I've been in search of the "holy grail" of plain text markup for years. Before the first django version of this site, I was writing a custom mod_python handler to run a server-page style site and as part of that writing my own wiki creole. Since then, I've used TracWiki, reStructured Text, Markdown, MediaWiki, and Textile but have been left slightly disappointed by them all.

TracWiki (a MoinMoin dialect) is great, but outside of trac there's too much baggage. MoinMoin dialects in general suffer from ugly/verbose inline style markup. reStructured Text is what I try to use for all of my python documentation, but I find its inline markup rules a bit odd and its flexibility for extension lacking. Markdown is what I use for this site, but the straight version has some very annoying markup rules, I don't like it's URL linking, and without extensions a lot of things are a pain to do. MediaWiki and Textile I have the least experience with, so while I've used they they didn't set my fire.

I suppose I have my own ideas about what I want certain things to do. I like most of markdown's philosophy: being able to in-line HTML when it's just not as convenient to accomplish what you want in markdown is good, a focus on simplicity and readability of the source text is good. Being able to easily and naturally escape (using the standard escaping character, , rather than Moin creole's !) is also nice. There are a few stylistic decisions I don't agree with; specifically, I don't like that _this_ italicizes and so does *this*. Python markdown's extensions leave a bit to be desired, and the quoting and preformatted block support is really bad for just about every conceivable application of quotes or preformatted text.

So I started kinda formulating in my mind what my ideal markup language would be. Moin/TracWiki syntax looks too much like marked up text, markdown has a few stylistic rough edges I could do without, but my ideal is probably a patois of the two. I prefer links in the style of [http://example.com link title], I like styling my text with simple and non distracting items like a single *.

As I was doing this, I started looking at code to see what this kinda thing would entail. At first, I went straight to ned batchelder's parsing page, which is a fantastic resource for these things. I looked a bit into LEPL and picoparse, and while both were fascinating that sytle parser seems a lot more suited to strict grammars than markups.

So, starting with TracWiki, I checked out the python versions of the other parsers I've talked about. They all use a custom regular expression based recursive descent parser. So now I'm just confused. The python re module is written in C, so I know already that using regular expressions is very fast; faster than combining all sorts of strips & splits (which have python string/list creation overheads associated with them). Do traditional parsing packages and lexers not really fit right for this problem?

As a final comment, I quickly examined the pygments package, but realized that they don't need to do any parsing (just lexing), and they use regular expressions to lex.. if that's a verb.

comments

+ leave a comment on "Plaintext Style Markup"