August 22, 2023
On my third attempt at solving the problem of searching JavaScript source code for statements instead of text, I finally arrived at an interface that is both functional and elegant. It has also stirred some controversy so I’ve set out to put into words (1) why I perceive this to be the best solution in the design space visible to me, and (2) the road that led me to it.
The scanner program, esgrep
, works by accepting a query
from the user that describes a target statement they’re looking for, and
then scans the source’s syntax tree for matching constructs to finally
print the relevant portions. Implementation aside, the user interface
itself, or rather, the way the target is described has been a stubborn
problem to address.
Let me explain.
If I were to envision the ideal language to host such queries, I would describe it as follows, in order of importance:
In pursuit of such a host, I’ve explored different languages and approaches, including:
SQL was ruled out early on as it can get verbose and would be impractical for use on the command line. Ideally, I want queries to fit on a single line; multiline statements in terminal emulators do not make for the greatest experience.
In JavaScript’s ecosystem, CSS selectors are used to query the DOM, which has the shape of a tree. Since the scanner’s input is also a tree, my first attempt was to use CSS selector syntax and semantics to express AST selection. It was a natural first choice because it’d be familiar to my audience.
The general idea is to:
call
{async: true}
would map to
fun[async]
call > arg
:
operator, e.g. a
positional filter to select the first argument in a function call would
map to call > arg:nth-of-type(1)
To judge how well the language can carry the semantics of the scanner, let’s map some AST selections to CSS selectors.
Test #1: a string literal of any value:
str
Probably not the most useful selector on its own, it still mapped
well: clear and succinct. What if the target was a string that contains
the word “hello” in it? We’ll revisit the str
selector to
support a value
attribute that the user can fill in.
Test #2: a string literal containing the word “hello”:
[value*="hello"] str
The introduction of a named attribute (value
)
means that every user of this selector is now required to remember its
name for them to use the selector at all. You could argue that that’s a
good thing, or a leak, but let’s reserve judgment.
The scanner works by applying filters to AST nodes, which are
functions that map a node to a boolean. But it also has operators that
may or may not map to a value. One such operator is ref()
,
which dereferences a literal value to select its references. We’ll
expose it as an element function.
Test #3: a reference to a string literal that contains the word “hello”:
[value*="hello"]: ref(); str
I like it. Composition is one of the defining features of declarative query languages, so let’s add support for a union operator that would select x or y.
Test #4: a string literal that contains the word “hello” or a template literal that contains the word “hello” in one of its spans:
[value*="hello"], tpl[quasis*="hello"] str
Do you notice the second leak in the interface? Suddenly, the user has to know about an implementation detail of the lexical components of a TemplateLiteral and remember it in order to select one. “What’s a quasis?” would be the first thing for them to ask, just like I did in the past and don’t necessarily feel the need to share the joy.
On the positive side, CSS selectors already have union support
through the ,
(comma) operator so the translation mapped
nicely. There is also native support for complements through the
:not()
function, and intersection (on attributes) by
conjoining their brackets a la fun[name="x"][async]
.
Watch, though, how it starts to fall apart when attribute values
themselves become selectors. Pick the most basic and common selector,
id()
, which selects statements with a matching Identifier,
and apply it to an attribute.
Test #5: find a call to the function x:
[receiver=id[value="x"]] call
We’re quickly exiting familiar CSS selector territory. I am yet to
see such a CSS selector, and the syntax highlighter on this page that is
choking on the inner expression id[value="x"]
suggests that
I never will — this isn’t valid CSS anymore! So, how to solve this? I
could get crafty and find ways to extend the syntax in a way that would
closely resemble valid CSS, but it still wouldn’t be. And I’d need to
communicate that to the user, but that’s not the worst of it: for them
to issue such a query, they must always be aware of this divergence and
how I chose to address it.
You could say that id[value="x"]
is overstated and I
will agree, but keep in mind that the scanner needs to accept selectors
in far more positions than just the root. Attributes should be
selectable, with id()
or otherwise. If we go back to
Test #3
where references to a string were selected, what if
we were looking for a call to the function x
made with such
references as its last argument?
[receiver=id[value="x"]]
call> arg[value=str[value*="hello"]:ref()]:last()
Yes, it can be expressed. Can we still reasonably compose
the str
part, though? It might be a template literal, or a
string, and we don’t particularly care:
[receiver=id[value="x"]]
call> arg[
value=str[value*="hello"]:ref(),
[quasis*="hello"]:ref()
tpl:last() ]
Reasonable is a subjective term, but I think it’s fair to say that the syntax has too many primitives. There’s also hardly any consistency – I’d need to constantly consult a manual to write such a query. How do you even indent it? Merely writing this example took a substantial mental effort!
Retracing my thoughts: in the process of writing this query, my mind had to remember the following:
call
, arg
,
str
, and tpl
[x=y]
receiver
for
call
, value
for id
,
arg
and str
, and quasis
for
tpl
contains()
that I need, which is actually invoked as [x*=y]
value
(although I
almost typed it as name
, and I’m the author!)call
or a
descendant of it, and if it’s the latter, how i navigate the axis –
x > y
ref
is an operator:x()
x,y
last()
x:f()
I want to minimize this mental flow. Perhaps we can do better in
converting the argument’s value expression to be a descendant of it,
then we may employ the axis operator (>
) to tidy it:
[receiver=id[value="x"]]
call> arg:last()
> str[value*="hello"]:ref(),
[quasis*="hello"]:ref() tpl
Better! Except that this decision was completely arbitrary and even then, it can only work for nodes like call arguments that comprise exactly one expression.
All in all, this is a painful experience. So maybe hijacking languages made for querying XML nodes was not such a great idea after all. What if we introduce a DSL instead? With complete control over the syntax, surely we can do better.
Perhaps the biggest advantage to building a Domain Specific Language (DSL) is in the degree to which it lets you tailor an experience. And what I ended up doing counteracted that advantage entirely…
First, I knew what I did not want the language to be: a novel one. This is in no small part due to jq, one of the references I made early in the post. I’ve used it a lot over the years and it’s excellent – once I get it to work. It seemed that no matter how much I used it, I was not becoming adept at it, which prompted me to look into its design and try to learn why.
This is entirely a subjective take and not a formal critique. I find jq to suffer from an identity crisis in its interface where it looks like JavaScript, and Bash, and JSON, while being none of them, which would normally be fine if only the language was more consistent.
For example, I find myself asking these and similar questions every time I use it:
()
), or brackets
([]
), or a pipe (|
)?map
, like
map(f)
, or [.[] | f]
?.
(dot) operator to index an array, and
I need to wrap it back into an array to pass it through another filter,
do I use []
to surround the point of deconstruction, or the
enclosing body, or the filter call site?.
(dot) operator to index objects too?
What about properties that have spaces in their names, do I wrap them
with quotes?To be fair, a glance at the manual usually unblocks me. What I took away the most was that I did not appreciate the different ways jq lets me do the same thing. A choice with no impact… it only adds fatigue. And, to a lesser extent, the number of primitives that I need to keep in cognitive reach at all times is tiresome.
Okay, so what do I want the language to be, then? What I chose was: a familiar one. Naturally, what’s most familiar to a JavaScript audience is JavaScript itself.
Welcome to my first and biggest mistake…
By voluntarily setting the constraint that the language ought to look like another, I made a full circle back to my earlier experience with XPath and CSS. The design decisions were constantly made in light of: “Well, how does JavaScript do it?” and then look for an adaptation that may or may not resemble the grandfather. Whereas outside such an artificial constraint, the driving factor would’ve been: “What is the simplest way to do this?”
This is what it looked like:
// Any string
:string
// Any object with 0 more properties
:object
// A string or an object (composition)
:string | :object)
(
// The "foo" string
"foo"
// Object with 0 properties (i.e. an empty object)
{}
// Object with the "a" property
{ a }
// Object with properties that are neither "a" nor "b"
^a, ^b }
{
// Call to function "f"
f()
// Call to function "f" on "this"
this.f()
// Call to function "f" on either "this" or "store"
this | store).f()
(
// Call to function "then" (e.g. Promise) with the second
// argument being a function that returns another Promise
// using the "new" keyword
**.then(*, :func(*, :of(Promise)))
// an instance of the default export of the module "class.js"
:of(:exportOf(class.js))
// a RegExp literal with "foo" for a pattern
/foo/
// A Link JSX element without an href attribute
<Link ^href />
And this is how it would (somewhat) compare to the previous language:
// a string of any value
:string
// a string with the word "hello":
"hello"
// a string with the word "hello", or any reference to it:
:ref("hello")
// a string or template literal with *exactly* the word "hello":
"hello" | `hello`
// a call to the function x
x()
// a call to the function x with the last argument being a string
// with exactly the word "hello", or any reference to it:
x(**, :ref("hello"))
// the above, but with either a string or a template literal:
x(**, :ref("hello") | :ref(`hello`))
You can see the rest of it in its manual page, which unironically has a preamble on how to read the manual itself. You know what, it might make for a good demo, it looks quirky and curious and inviting, but go ahead and use it and I guarantee you won’t go back to it. Not willingly, at least.
Again, same or similar symptoms to previous approaches:
This pattern would repeat itself, where an existing, familiar language translates mostly but not quite entirely well into what the scanner needs to receive in order to operate. And I would resist the urge to design a new one because of the learning overhead that would incur on users.
That is, until I tried to apply symbolic expressions to the problem. Suddenly, it was no longer possible to resist.
“Elegance is beauty that shows unusual effectiveness and simplicity.” -Wikipedia
I can hardly find a better testament to this statement (emphasis mine) than in what SEXP did to esgrep’s query language problem. I’m under no illusion that the solution is free of shortcomings, but from where I stand, SEXP simply smashes the competition.
Without further ado, behold my newfound love:
; a string of any value
str)
(
; a string with the word "hello":
str /hello/)
(
; a string with the word "hello", or any reference to it:
ref (str /hello/))
(
; a string or a template literal with the word "hello":
or (str /hello/) (tpl /hello/))
(
; a call to the function x
call x)
(
; a call to the function x with the last argument being a string
; with the word "hello", or any reference to it:
call x (arg -1 (ref (str /hello/))))
(
; the above, but with either a string or a template literal:
call x (arg -1 (ref (or (str /hello/)
(tpl /hello/)))))
(
; the above again, but this time with a superselector to save the day:
call x (arg -1 (ref (str+ /hello/)))) (
LOVE, LOVE, LOVE. Listen. The parenthesis suck, yes. Look past that, right to where the elegance is bound to strike you: just about everything the scanner performs can be expressed in precisely two syntax primitives: an atom, and a list. An atom is just an atom, while a list may contain atoms or other lists in turn, and onwards to your heart’s desire.
Language construct selectors? A list. Composition? A list of lists. Dereferencing? A list and an operand that selects the referent, which is also a list!
That there are only two primitives has forced me to treat them as the precious commodity that they are and pay great attention to what and where things are and aren’t allowed. I can only regard that as beneficial both to myself as a maintainer and to users, as the coming sections aim to substantiate.
Atoms first.
Atoms - wherever they appear - represent parameters supplied by the
user. This includes identifiers, literal values, locations, and
qualifiers, like x
, "hello"
, 1
,
and async
.
The syntax being tight leaves little room for ambiguity, which in
contrast gives me room to optimize for the 80% use case. A prime example
of this is how atoms can stand in1 for identifier
selectors: an atom like fetch
, that is not surrounded by
any symbol, expands into (id fetch)
:
call fetch) ; is equivalent to
(call (id fetch)) (
This is tremendously useful as you have to deal with identifiers very often. It saves you keystrokes and reads better.
When surrounded by ""
(double quotes), atoms expand into
a string literal selector:
call fetch (arg 1 "/users")) ; is equivalent to
(call fetch (arg 1 (str /users))) (
And when surrounded by //
(forward slashes), atoms
expand into a regex pattern selector:
str /foo/i) ; is equivalent to
(str (pat foo i)) (
And finally, the small but potent placeholder atom _
(underscore) which stands for “nothing or anything”:
call fetch (arg _ "/users")) ; a call to fetch with "/users"
(call _ (arg _ "/users")) ; a call with "/users" (
This is actually a keyword that has no corresponding selector, so there’s no expansion at play.
The second primitive of the language is the list, which is denoted by
a pair of parenthesis (()
). Lists are used to represent
every operator and filter provided by the scanner, with their names
listed as the first element in the list.
call
-> (call)
ref
-> (ref)
Filters pertain to language features and select the target, while operators pertain to scanner features and mutate that selection.
I’m still debating whether operators should be marked with a reserved
symbol, akin to Clojure with its leading :
(colon)
symbol:
str)) (:ref (
That would give me greater room to expand the set of operators without risk of conflict with language-specific filters. At the same time, it adds complexity on the user’s end.
Contrast to previous iterations, the SEXP variant does not accept
keywords for arguments. This is both good and bad. It’s not as leaky,
which is good: the user doesn’t need to remember a word like
quasis
or such. But they still need to know where
a parameter is supplied, which I still can’t find a way to avoid.
I did consider a system of inference, where a supplied value aligns
to its intended parameter automatically based on its type. For example,
consider the (fun)
selector signature:
fun [name] [args] [qualifiers]) (
Although the first parameter is legal only as a name
selector, in the following query it can be predictably marked blank:
; a function with at least one argument
fun (arg 1)) (
By looking at (arg)
we can infer that it is only legal
in the second parameter position ([args]
), so it aligns as
the value to that parameter. There is no ambiguity in this case, but not
so in others. The inconsistency that would be caused by only
situationally being able to infer, even if it is properly
signalled, is not a good price to pay to me.
One of the qualities I desired in the language was its expansiveness; or that it can accommodate new functionality without adding to the usage complexity.
Watch how 8 different selectors can scream together into a wail of beauty:
call (mem t
(ref
(call
(import @canvas/i18n useScope))))
(arg _ (obj (prop count)))) (
This query is actually from a real-life case at my previous job,
saving me from likely a day’s worth of writing yet another AST scanning
program to isolate the statements that were causing a bug. Calls to the
function I18n.t()
— that you get by instantiating an object
from the useScope
method exported by the
@canvas/i18n
package — were being made with an object that
contained a count
property of a non-numeric value. The
value type was obviously not available to a static analyzer as they were
not literals, but what the analyzer allowed let me to do is cut down
from 9000 calls to go through by hand down to a few hundred.
So, I decided to (re)build esgrep once and for all.
Paranthesis… or, the elephant in the room. Formatting the SEXP query is somewhat of an awkward dance. A list expression is defined by a balanced pair of parenthesis, and once you’re 5+ levels deep, it starts to get unwieldy.
There are ways to mitigate the formatting problem; it won’t be eliminated altogether but perhaps turn into a minor to non-problem. Parinfer and aggressive-indent-mode algorithms are such possible ways. The idea is to employ an intelligence layer to aid in (a) indenting, and (b) balancing parens.
Function signatures are also another potential usage problem, since they expect parameters at specific positions. I can’t expect users to remember the signatures or positions, but consistency can help a great deal here.
Consistency
I tried my best to establish a few signature rules towards a consistent interface, which can be visualized as such:
[id XOR value] > [properties] > [descendants] > [qualifiers]
(id)
. And in the case of nodes that are identified by their
position as a descendant of another node, like arguments to a call
expression, that’d be a (num)
.The rules might sound arbitrary when listed like this, and they might
be. I arrived at them through intuition. For example, an array literal
has no name, nor properties; it has nothing actually outside of its
value, which are its elements, or descendants, so when I found myself
using the (arr)
selector, my mind was expecting nothing but
the elements of the array, so I declared them first:
arr [els]) (
In the case of a CallExpression, the first thing that comes to mind is the name of the function being called, or the callee in formal terms. And then the arguments the call is made with. So the signature follows to become:
call [callee] [args]) (
Ultimately, my guiding principle and goal is “less interface”. The transparency of the language is key to the usability of the scanner. If one way is decidedly more intuitive, it becomes the way. The less syntax and fewer rules there are in order to use it, the better.
Environment
Rules aside, editors and IDEs can also help improve usability. Something similar to VSCode’s inlay hints that would show the signature of the selector the user is currently building could go a long way.
Macros
Last but not least, on the verbosity front, and where a query tends to stretch or repeats itself: macros. User-defined macros can reduce the cognitive overhead, allow for domain-specific selectors, save keystrokes, and just make for better readability overall.
Take the query shown previously in the scalability section:
call (mem t
(ref
(call
(import @canvas/i18n useScope))))
(arg _ (obj (prop count)))) (
We could redefine the callee in terms of an i18n-t!
macro and store it somewhere:
(def i18n-tmem t
(ref
(call
(import @canvas/i18n useScope))))) (
And use it later:
call i18n-t! (arg _ (obj (prop count)))) (
In the beginning I described what the ideal language host for esgrep’s queries would look like, this is how I would grade the SEXP variant on that scale:
Expressive = 1
Transparent = 0
(converging to 1, depending on how
accustomed you’re to it)
Simple = 1
Expansive = 1
Succinct = 0
(parens can trip you up, and you may
need multilines)
Abstract = 1
Overall, I’m pleased with it. It’s a solid foundation with room to grow. Thanks for reading!
There still is a reason to use the explicit forms of the selectors an atom can expand into. One such reason is to specify qualifiers, which you seldom need to do. As such, there is no hypocrisy at play when jq gets criticized earlier for overloads as these are not.↩︎