🡅© 2024 Paul Horton. All rights reserved

A Light Introduction to formatting strings in Elisp

Elisp provides two approaches to formatting strings; one closely based on printf from the C language, and another one more loosely based on Common Lisp format — a domain specific language for formatting strings. The Common Lisp format language has many features and would require a lot of effort to learn thoroughly. However it has the advantage of being concise and nicely covers some common cases which printf does not handle.

In Elisp "format" means (at least) two things!

A potential source of confusion is that Elisp provides a command format, which is not Common Lisp format, but rather something similar to printf in C or Perl. Fortunately, there exists a package cl-format written by Andreas Politz which roughly corresponds to a port of Common Lisp format to Elisp. At the time of this writing it is not in the elpa package archive, but is available from some github repositories, for example github:emacsmirror/cl-format.

Below I will sometimes use "CL Format", when describing shared aspects of cl-format and Common Lisp format. But please note that cl-format diverges from Common Lisp format significantly — not only in that it does not support all of Common Lisp format functionality, but also in that cl-format provides some additional functionality of its own.

Elisp format is basically printf

It is instructive to compare the system man pages on the printf library function with the Elisp format docstring (using the describe-function command) or its description in the Elisp manual. To read man pages in emacs, execute command man and enter "printf(3)" at the prompt (the "(3)" here denotes library call, otherwise (on linux anyway) you would get the man page for the shell builtin also named printf)).

You can see the description of the Elisp format and the library call printf are very similar. In fact the Elisp implementation of format (a C function named styled_format in file editfns.c) seems to partially rely on calling the sprintf C library function to do its formatting. (This is one of the nice things about emacs for programmers, if you aren't sure how an emacs function or built-in works, you can literally "go to the source" and get some idea how it works).

Simple uses of cl-format to format strings in Elisp

The very simplest uses of cl-format can look similar to printf.
(defalias 'printf 'format);  To use the name printf instead

;;function     program         data
(printf        "%9.3f"     3.14159265358979)  -->  "    3.142"
(cl-format nil "~9,3f"     3.14159265358979)  -->  "    3.142"
;;         ^^^ nil here means just return the output as a string.
This example shows that role of ~ in a cl-format program is roughly analogous to that of % in a printf program. The directive f also has similar meaning, so one could say that %f in printf corresponds to ~f in cl-format. Other closely analogous directives include: b, d, e, g, o, and x.

For both functions, the 9 and 3 (in "%9.3f" and "~9,3f" respectively) also have the same meaning here; 9 is the minimum output string width and 3 is the number of digits after the decimal point. In this case, the comma , in the printf program can also be seen as loosely corresponding to the period . in the cl-format program. The period in the printf example is used to separate the "field width" from the "precision" (both optional), while commas in cl-format programs are used more generally to separate parameters known as prefix parameters whose number and meaning depend on the code character (f in this case).

Grouping Digits

In everyday life, we often group digits when writing big numbers (say 112071717889220). In English and many other languages digits are naturally grouped in threes 112,071,717,889,220. To obtain print numbers like this one can use ~:d the d directive of CL format with the colon modifier.
(cl-format nil "~:d" 112071717889220);; -->  "112,071,717,889,220"

Some languages (Chinese, Japanese, Korean) for example naturally group digits in fours. Here is an example of how to do that:

;;      fat space --> <---                 Units   兆    億    萬
(cl-format nil "¥~,,' ,4:d" 112071717889220) --> "¥112 0717 1788 9220"
;; approx. Japanese Government budget (fiscal year 2024)
;;
;; just for comparison, some similar expressions
;; Use '_' as a separator
(cl-format nil "¥~,,'_,4:d"  112071717889220) --> "¥112_0717_1788_9220"
;;
;; fourth parameter omitted --- defaults to 3
(cl-format nil "¥~,,'_:d"    112071717889220) --> "¥112_071_717_889_220"

;; Attempt to use string "__" as separator
(cl-format nil "¥~,,'__:d"   112071717889220);  No good. Signals error "Excess parameter given for directive ~_",...
The format directive ~,,' ,4:d requires some explanation. The core of it is  ~d  which says to output an integer (in decimal form). The colon modifier : before the d says to group digits. The three commas , are used to separate four parameters accepted by  ~:d. In this example we change the 3rd and 4th parameters only. The first two commas ~,, just skip over the first two parameters (without giving them new values — their default values are used).
For the third parameter value we stipulate '  which says to use the character following the single quote as the separator. I use a double width space here to give the groups some room.

Basic output of string arguments

The Elisp format directives %s and %S, and the cl-format directives ~s and ~S are both intended to format strings (and other types) as output, but differ in detail.
;; (printf-like) format in Elisp
(format "%s" "dog");  --> returns length 3 string:  "dog"
(format "%S" "dog");  --> returns length 5 string: "\"dog\""
;;
;; cl-format in Elisp
(cl-format nil "~a" "dog"); --> returns length 3 string:  "dog"
(cl-format nil "~s" "dog"); --> returns length 5 string: "\"dog\""
(cl-format nil "~S" "dog"); Signals error "Unknown directive ~S"
;;
;; Using Common Lisp
CL-USER> (format nil "~a" "dog")
"dog"
CL-USER> (format nil "~s" "dog")
"\"dog\""
CL-USER> (format nil "~S" "dog");  ~s and ~S do the same.  ~A would also be the same as ~a
"\"dog\""
So in this particular case the heavily used cl-format directive ~a behaves more like %s than ~s does. Note also that while directives are case insensitive in Common Lisp (i.e. ~s is equivalent to ~S) Elisp cl-format treats directives case sensitively. cl-format only predefines the lower case directives; freeing up the upper case letters for use as potential new directives.

In passing I note that all of these Lisp string oriented directives accept non-strings as input (as would Perl printf). This is unlike the C language printf library function, which requires a "C-string" type as input.

;; In Elisp
(format "%s %s" 65 3.14);; -->  "65 3.14"
;;
(cl-format nil "~a ~a" 65 3.14);; -->  "65 3.14"
(cl-format nil "~s ~s" 65 3.14);; -->  "65 3.14"
;;
CL-USER> (format nil "~a ~a" 65 3.14);; -->  "65 3.14"
CL-USER> (format nil "~s ~s" 65 3.14);; -->  "65 3.14"
;;
;; Just for fun, in Perl
% perl -e 'printf( "%s %s", 65, 3.14 )'    OK. prints "64 3.14"
;;
;; In C you can compile similar (with warnings)
;; But running the code won't work.
printf( "this {%s} is a not string\n", 65 );   ---> Segmentation Fault!
printf( "this {%s} is a not string\n", 3.14 ); ---> Segmentation Fault!

An aside; Elisp format-spec

If this were all cl-format could do, there would be no reason not to just use the Elisp format function all the time. It is indeed useful and heavily used.

I won’t cover Elisp format much here since it is similar to printf which is described in many places. I will however mention in passing that Elisp provides a function format-spec, akin to string interpolation with formatting control. Conveniently, printf-like modifies for field padding, width and precision are automatically provided.

(let1 spec '((?年 . "2024")
             (?C . "© Copyright")
             (?著 . "Paul Horton")
             (?權 . "All rights reserved.")
             )
  (insert
     (format-spec ";;  %C %年, %著, %權\n"   spec)
     (format-spec ";;  %C %年, %20著, %權\n" spec);; 20 specifies the field width
     (format-spec ";;  %C %年, %-20著, %權"  spec);; -20 same with but left justify
  ))
;; inserts:
;;  © Copyright 2024, Paul Horton, All rights reserved.
;;  © Copyright 2024,          Paul Horton, All rights reserved.
;;  © Copyright 2024, Paul Horton         , All rights reserved.

;; or in ascii land try
(let1 spec '((?Y . "2024")
             (?C . "Copyright")
             (?A . "Paul Horton")
             (?R . "All rights reserved.")
             )
  (format-spec "%C %Y, %A, %R" spec)
  )
;; returns -->  "Copyright 2024, Paul Horton, All rights reserved."
let1 here is a just convenience wrapper around let; allowing (let ((varname val)) body...) to be replaced with (let1 varname val body...).

As the example shows, Chinese characters can also be used as directive characters. (For some reason the implementation of format-spec filters format strings with a regular expression which rejects most non-alphanumeric characters. The copyright symbol character ©, for example, cannot be used as a directive.)

In any case, format-spec seems to be a useful but relatively simple extension to printf.
cl-format on the other hand can do much more. Below I introduce a small fraction of its functionality and close with some links to more information.

Caveat: Excess Arguments Silently Ignored

When learning how to use these output formatting functions it might be good to know that (for better or worse) they happily ignore extra arguments.
;; (printf-like) format in Elisp
(format "stuff" "extra stuff ignored");; -->  "stuff"    No error.
;; cl-format in Elisp
(cl-format nil "stuff" "extra stuff ignored");; -->  "stuff"    No error.
;;
;; Common Lisp
CL-USER> (format nil "stuff" "extra stuff ignored");;  -->  "stuff"    No error.
;;
;; Just for fun, in Perl
% perl -e 'printf( "stuff", "extra stuff ignored" )'   prints "stuff"   No error.
;;
;; Even works in C, (although gcc -Wall gives a compiler warning)
printf( "stuff", "extra stuff ignored" ); --> prints "stuff"   No error.

Newline and Open Newline

The cl-format directives ~% and ~& are similar but distinct. ~% unconditionally outputs a newline. On the other hand, the "fresh line" directive ~& usually outputs a newline, but refrains from so when it knows that would produce a double newline.
;; cl-format in Elisp
;; Below, I use ' ␤ ' to denote one newline character
(cl-format nil "cat~%dog")    -->      cat ␤ dog
(cl-format nil "cat~%~%dog")  -->      cat ␤ ␤ dog
(cl-format nil "cat~&dog")    -->      cat ␤ dog
(cl-format nil "cat~&~&dog")  -->      cat ␤ dog
(cl-format nil "cat~&~%dog")  -->      cat ␤ ␤ dog
(cl-format nil "cat~%~&dog")  -->      cat ␤ dog
;;
;; elisp strings can include a literal newline character
(cl-format nil "cat\ndog")    -->      cat ␤ dog
(cl-format nil "cat\n~&dog")  -->      cat ␤ dog
;;
;; Using Common Lisp
CL-USER> (format nil "cat~%dog")    -->  cat ␤ dog
CL-USER> (format nil "cat~%~%dog")  -->  cat ␤ ␤ dog
CL-USER> (format nil "cat~&dog")    -->  cat ␤ dog
CL-USER> (format nil "cat~&~&dog")  -->  cat ␤ dog
CL-USER> (format nil "cat~&~%dog")  -->  cat ␤ ␤ dog
CL-USER> (format nil "cat~%~&dog")  -->  cat ␤ dog
;;
;; In Common Lisp "\n" in a string literal is just a plain 'n', not a newline
CL-USER> (format nil "cat\ndog")    -->  catndog
CL-USER> (format nil "cat\n~&dog")  -->  catn ␤ dog
The examples above illustrate the difference between the ~& and ~% directives and also a difference between the way Elisp and Common Lisp interpret \n in a literal string.

~% can also ~& accept a number-of-newlines numerical argument. For example ~3% outputs 3 newlines unconditionally, while ~3% acts like ~&~2%

;; in Elisp
(cl-format nil "cat~3%dog")   -->  "cat ␤ ␤ ␤ dog"
(cl-format nil "cat~3&dog")   -->  "cat ␤ ␤ ␤ dog"
(cl-format nil "cat\n~3%dog")  --> "cat ␤ ␤ ␤ ␤ dog"
(cl-format nil "cat\n~3&dog")  --> "cat ␤ ␤ ␤ dog"

Elisp terpri can also output "fresh lines"

Common Lisp provides the functions terpri and fresh-line for outputting new lines (unconditionally) and ensuring a fresh line respectively.
In Elisp terpri does both jobs, depending on its optional third argument ENSURE.
;; in Common Lisp
CL-USER> (terpri)␤     ≺-- this newline output by slime REPL, fresh-line outputs none in this case.
␤                      ≺-- this newline output by terpri
NIL                     |--- return value of terpri, displayed by slime REPL
CL-USER>  (fresh-line)␤ ≺--  this newline output by slime REPL
NIL                     |--- return value of fresh-line, displayed by slime REP
;;
;; in Elisp
(progn;;  this inserts two newlines into the current buffer
  ;;        OUTSTREAM      ENSURE
  (terpri (current-buffer) nil)
  (terpri (current-buffer) nil)
  )
(progn;;  this inserts one newline into the current buffer
  ;;        OUTSTREAM      ENSURE
  (terpri (current-buffer)  nil)
  (terpri (current-buffer)    t);;  Like Common Lisp fresh-line
  )
Again, I use the Unicode character ␤ to show where newlines are printed. Note this example uses (current-buffer) as the output stream; use t to stipulate standard out, which by default goes to the echo area.

Tabbing Compared

Outputting tab characters

In Elisp source code, the tab character (ascii code 9) can be represented as the integer 9, or as \t; both in strings "\t" and as a character literal ?\t. Common Lisp on the other hand (only?) provides \#Tab to represent a literal tab character. This difference affects the number of options available when writing code which outputs tab characters.
;; cl-format in Elisp
(cl-format nil "cat\tdog")      -->          "cat	dog"   (with a tab char between cat and dog)
(cl-format nil "cat~cdog" ?\t)  -->          "cat	dog"   (with a tab char between cat and dog)
(cl-format nil "cat~cdog" 9)    -->          "cat	dog"   (with a tab char between cat and dog)
;;
;; Common Lisp
;; Here, "\t" in a the string literal just means a plain 't'
CL-USER> (format nil "cat\tdog")       -->   "cattdog"
CL-USER> (format nil "cat~cdog" #\Tab) -->   "cat	dog"   (with a tab char between cat and dog)
CL-USER> (format nil "cat~cdog" 9)     !! Signals type error:  9 is not a character
In summary, comparing Elisp to Common Lisp:
1) \t (and \n) in string literals are treated differently.
2) Sending the c format string directive a literal representation of the tab character works similarly.
3) In Common Lisp characters and integers are distinct types, but in Elisp "character"s are in fact non-negative integers

"Tabbing" using space characters

Common Lisp format has "tab stop" directive ~t, which can be used to format tabular output. The ~t does not output tab characters, but instead outputs one or more spaces in an attempt to produce the desired spacing. I don't think it has a counterpart in standard printf (or the Elisp format function). At first glance it seems similar to a field width specifier; but as the example below shows, it is distinct.
;; (the printf-like) format function in Elisp
(format   "%-8sdog" "cat")               "cat     dog"
(format "%d%-8sdog" 9 "cat")             "9cat     dog"
;; cl-format in Elisp
(cl-format nil   "~8adog" "cat")         "cat     dog"
(cl-format nil "~d~8adog" 9 "cat")       "9cat     dog"
(cl-format nil "cat~8tdog")              "cat     dog"
(cl-format nil "~dcat~8tdog" 9)          "9cat    dog"
;;
;; In Common Lisp
CL-USER> (format nil "~8adog" "cat")     "cat     dog"
CL-USER> (format nil "~d~8adog" 9 "cat") "9cat     dog"
CL-USER> (format nil "cat~8tdog" 9)      "cat     dog"
CL-USER> (format nil "~dcat~8tdog" 9)    "9cat    dog"
Note that the ~8t directive places "dog" in the same position regardless of whether the output has a preceding "9". This is not so for the %-8s and ~8a directives since they only consider the width of their argument ("cat" in this case).

Formatting fixed width text tables seems to be a strength of Common Lisp format compared to printf, which I think could be rather useful for presenting information in human friendly form. I've only scratched the surface here, please see the links to more information at bottom.

Conditional Directive

Conditional Directive; Boolean Case

When used with a colon modifier (in the form ~[alt_false~;alt_true~] it can output conditionally based on a Boolean value.
(defun my-test-report/ver1 (result)
   ;; using printf-like format
   (format "test %sed." (if result "pass" "fail"))
   ;; analogous to C code:   printf( "test %sed",  result? "pass" : "fail");
   )
(defun my-test-report/ver2 (result)
   (cl-format nil "test ~:[fail~;pass~]ed." result)
   )
(my-test-report/ver1 (= 2 (+ 1 1))) -->  "test passed."
(my-test-report/ver2 (= 2 (+ 1 1))) -->  "test passed."
(my-test-report/ver1 (= 5 (+ 1 1))) -->  "test failed."
(my-test-report/ver2 (= 5 (+ 1 1))) -->  "test failed."
This idea is borrowed from an example given in Chapter 9 of Peter Seibel's Practical Common Lisp book.
In this simplified version, the my-test-report/ver1 using printf-like Elisp format perhaps looks nicer than the second version using ~:[ with cl-format — if for no other reason that I'm more familiar with printf style directives.

However the approach using ~:[ with cl-format has the advantage of placing the formatting information together in a single string. This could make it easier to share the formatting across functions or to customize it via local binding of a variable holding the formatting string. For example:

(defvar my-test-report-format  "~:[fail~;pass~]ed.")

(defun my-test-report/ver3 (result)
   (cl-format nil my-test-report-format result)
   )

(format "test1: %s   test2: %s"
 (my-test-report/ver3 (= 2 (+ 1 1)))
 (my-test-report/ver3 (= 5 (+ 1 1)))
) -->  "test1: passed.   test2: failed."

;; locally overwrite formatting string
(let1  my-test-report-format  "~:[失敗~;成功~]了"
  (format "測試1: %s   測試2: %s"
   (my-test-report/ver3 (= 2 (+ 1 1)))
   (my-test-report/ver3 (= 5 (+ 1 1)))
)) -->  "測試1: 成功了   測試2: 失敗了"

Conditional Directive; Multiple Choices

More generally, the argument of the ~[ directive can be an integer index into an explicit list of choices. For example:
;; Using cl-format
(defun French-number/ver1 (i)
   (cl-format nil "~[zéro~;un~;deux~;trois~]" i)
)

;; Using (printf-like) format
(defun French-number/ver2 (i)
  (format "%s" (aref ["zéro" "un" "deux" "trois"] i))
)

;; The two versions work the same when i is in range
(French-number/ver1 2);; --> "deux"
(French-number/ver2 2);; --> "deux"

;; But behave differently when i is out of range
(French-number/ver1 5);; --> ""     (No error!)
(French-number/ver2 5);; -->  error: (args-out-of-range ["zéro" "un" "deux" "trois"] 5)
;; Common Lisp behaves similar to cl-format
;;
CL-USER> (format nil "~[zéro~;un~;deux~;trois~]" 2)
"deux"
CL-USER> (format nil "~[zéro~;un~;deux~;trois~]" 5)
""   ≺--  returns empty string ""
Here also, in this simple example using the cl-format ~[ directive to select a string is not necessarily better than using normal Lisp code to do the selection before formatting. And for the out of range case, I think I would usually prefer and error to be signaled. But again, in some circumstances it could be useful to have the formatting information all together in a single string.

Format list with a delimiter between items

When doing quick coding (for debugging for example), how many times have you output strings like "0, 1, 2, " when what you really wanted was "0, 1, 2" with no trailing comma? I know I have done it a lot, because its slightly easier. Fortunately cl-format makes it easy to output a list in the more natural form.
(cl-format nil "~{~a~^, ~}" ())        -->  ""
(cl-format nil "~{~a~^, ~}" '(a))      -->  "a"
(cl-format nil "~{~a~^, ~}" '(a b))    -->  "a, b"
(cl-format nil "~{~a~^, ~}" '(a b c))  -->  "a, b, c"
;;
;; Or use " | " as delimiter
(cl-format nil "~{~a~^ | ~}" '(a b c)) -->  "a | b | c"
The control string "~{~a~^ | ~}" used in this example can be parsed in the following way. The pair "~{ ... ~}" on the outside means to consume a list of values.

cl-format and character case

cl-format does not perfectly emulate Common Lisp format function (far from it actually). Some of the code snippets below are using Common Lisp in emacs via slime, the CL-USER> prompt means the code is Common Lisp, not Elisp.
In the real Common Lisp format the directive ~( can be used to downcase, upcase or capitalize.
CL-USER> (format nil "~(~a~)" "CAT");  This is Common Lisp format, not Elisp format
"cat"
One place in which they differ; Elisp cl-format does not use ~) as a case controlling directive, but instead uses ~) for part of a more extensible syntax for directives which enclose part of the format specification. The syntax is ~Χ(...~) where I've used the letter to represent a directive character. In particular, Elisp cl-format uses ~|(...~) for case control.
(cl-format nil "~(~a~)" "CAT");  Elisp
;; --> Signals error:
;; cl-format-parse-error: Error parsing cl-format string: "Unknown directive ~( ,use ~|(...~)", "~(~a~)", 2
;;
;; But fortunately this works
(cl-format nil "~|(~a~)" "CAT"); --> return "cat"

cl-format does not play well with the Elisp upcase/downcase mechanism?

The Elisp mechanism for upcasing, downcasing and performing case insensitive string matching is implemented using data structures called "case tables". The mechanism is a flexible one which can be used for much more than simply converting "a" to "A".
(Writing up an example is on my todo list).

Unfortunately it seems with-case-table cannot be used to customize the behavior of the cl-format ~| directive.

(let (
      (my-case-table (make-char-table 'case-table))
      (test-string  "乘 AND 乗, 黏 AND 粘")
      )
  (aset my-case-table ?乗 ?乘)
  (aset my-case-table ?粘 ?黏)
  (with-case-table my-case-table
    (format "downcase: %s  cl-format: %s"
     (downcase test-string)
     (cl-format nil "~|(~a~)" test-string)
    )
  ));  returns
;;    "downcase: 乘 AND 乘, 黏 AND 黏  cl-format: 乘 and 乗, 黏 and 粘"
The example first sets up a tiny case table capable of canonicalizing the form of a couple pairs of Chinese character variants (which perhaps should match each other when searching strings) and then illustrates the effect of using that case table with with-case-table has on its enclosing expressions. As shown above, with-case-table evidently affects downcase but not cl-format. Note that the downcase return value only canonicalizes the Chinese characters leaving the "AND" unchanged (as expected), but the cl-format return value does the opposite: downcasing "AND" but leaving the Chinese characters as is.

I'm not sure why, but perhaps the implementation of cl-format edits strings in a temporary buffer without arranging for it to inherit the current case table (and instead uses the standard case table by default?). It might be an interesting project to try to modify cl-format to respect with-case-table. A change like that might even make it into Emacs someday!

Where to learn more about Common lisp format?

As for information specific to the Elisp version;
I have not seen info pages for cl-format, but it does have an especially detailed docstring (use describe-function to see it).