Mixing Raw HTML and SXML

When I was coding up this web site, I thought for simplicity it would be best to allow myself to markup my posts directly in HTML. This would save me having to figure out how to integrate a markup language, and I wouldn't need to type my posts directly in SXML. Don't get me wrong, SXML is awesome, but not for writing most long form text.

Now, one of the advantages of using SXML for coding this site, was that it freed me from having to worry about most of the html issues I didn't really want to deal with, since it does things like automatic escaping of entities. However, this was precisely what I didn't want here, so we need a way to break out of it.

Here is my solution, (also available on github)


(define-module (render)
  #:use-module (sxml transform)
  #:use-module (web response)
  #:export (render-sxml universal-conversion-rules))
 
(define* (render-sxml sxml #:key (content-type 'text/html))
  (values
   (build-response #:code 200
                   #:headers `((content-type ,content-type)))
   (with-output-to-string
    (lambda ()
      (SRV:send-reply (pre-post-order sxml universal-conversion-rules))))))


(define universal-conversion-rules
  `(    ;;; mine
    (doctype . ,(lambda (tag) "<!doctype html>"))
    (raw *preorder* . ,(lambda (tag . rawstrings) rawstrings))
    (& . ,(lambda (tag str)
            (string-append "&" str ";")))
    ;; oleg's  universal-conversion-rules -- In the public domain
    (@
     ((*default*                         ; local override for attributes
       . ,(lambda (attr-key . value) (enattr attr-key value))))
     . ,(lambda (trigger . value) (cons '^ value)))
    (*default* . ,(lambda (tag . elems) (entag tag elems)))
    (*text* . ,(lambda (trigger str) 
		 (if (string? str) (string->goodHTML str) str)))))


;;; All the code below is taken from ssax's SXML-to-HTML.scm and
;;; util.scm so I don't need to import (wak ssax html). Also in the
;;; public domain.
(define (inc x) (+ x 1))

(define (entag tag elems)
  (if (and (pair? elems) (pair? (car elems)) (eq? '^ (caar elems)))
    (list #\newline #\< tag (cdar elems) #\>
      (and (pair? (cdr elems))
	(list (cdr elems) "</" tag #\>)))
    (list #\newline #\< tag #\> (and (pair? elems) (list elems "</" tag #\>))
      )))
 
(define (enattr attr-key value)
  (if (null? value) (list #\space attr-key)
    (list #\space attr-key "=\"" value #\")))


(define (make-char-quotator char-encoding)
  (let ((bad-chars (map car char-encoding)))

    ; Check to see if str contains one of the characters in charset,
    ; from the position i onward. If so, return that character's index.
    ; otherwise, return #f
    (define (index-cset str i charset)
      (let loop ((i i))
	(and (< i (string-length str))
	     (if (memv (string-ref str i) charset) i
		 (loop (inc i))))))

    ; The body of the function
    (lambda (str)
      (let ((bad-pos (index-cset str 0 bad-chars)))
	(if (not bad-pos) str	; str had all good chars
	    (let loop ((from 0) (to bad-pos))
	      (cond
	       ((>= from (string-length str)) '())
	       ((not to)
		(cons (substring str from (string-length str)) '()))
	       (else
		(let ((quoted-char
		       (cdr (assv (string-ref str to) char-encoding)))
		      (new-to 
		       (index-cset str (inc to) bad-chars)))
		  (if (< from to)
		      (cons
		       (substring str from to)
		       (cons quoted-char (loop (inc to) new-to)))
		      (cons quoted-char (loop (inc to) new-to))))))))))
))

(define string->goodHTML
  (make-char-quotator
   '((#\< . "&lt;") (#\> . "&gt;") (#\& . "&amp;") (#\" . "&quot;"))))

The purpose of this is to replace the usual call to sxml->xml in my code, with a different embellished version that uses a "raw" tag for content that isn't to be escaped. Also added were conveniences for adding the doctype and HTML entities.

A simple example of its usage is,


(define (sxml->string sxml)
  (with-output-to-string
    (lambda ()
      (SRV:send-reply (pre-post-order sxml universal-conversion-rules)))))

(sxml->string
 '((doctype)
   (html
    (head (title "title"))
    (body (p "This gets escaped &amp;gt;") (br)
          (raw "This doesn't get escaped &amp;lt;")
          (p "And this one gets escaped &amp;gt;")))))

;; Result:
;;
;;"<!doctype html>\n<html>\n<head>\n<title>title</title></head>\n<body>\n<p>This gets escaped &amp;amp;gt;</p>\n<br>This doesn't get escaped &amp;lt;\n<p>And this one gets escaped &amp;amp;gt;</p></body></html>"

Feel free to use this in your own pages, since oleg put his original code in the public domain, I feel I should do the same here. In the future, I may release a module of utilities for writing web pages, but for now, enjoy.