The rampart-treesitter module¶
Preface¶
Acknowledgment¶
The rampart-treesitter module is built on the Tree-sitter parser generator and incremental parsing library, originally created by Max Brunsfeld. The authors of Rampart extend our thanks to the Tree-sitter authors and contributors and to the individual maintainers of each bundled grammar.
The module bundles the following language grammars, sourced from the official tree-sitter organization where available and well-maintained community repositories otherwise:
| Language | Repository | Notes |
|---|---|---|
| JavaScript | tree-sitter/tree-sitter-javascript | Also linked into the rampart binary for the transpiler |
| C | tree-sitter/tree-sitter-c | |
| C++ | tree-sitter/tree-sitter-cpp | |
| Python | tree-sitter/tree-sitter-python | |
| Java | tree-sitter/tree-sitter-java | |
| Go | tree-sitter/tree-sitter-go | |
| Rust | tree-sitter/tree-sitter-rust | |
| TypeScript / TSX | tree-sitter/tree-sitter-typescript | Two grammars in a single repository |
| C# | tree-sitter/tree-sitter-c-sharp | |
| Ruby | tree-sitter/tree-sitter-ruby | |
| Bash | tree-sitter/tree-sitter-bash | |
| Kotlin | fwcd/tree-sitter-kotlin | Community-maintained |
| PHP | tree-sitter/tree-sitter-php |
The php_only
variant for pure-PHP files
|
| Swift | alex-pinkus/tree-sitter-swift | Community-maintained |
| Lua | tree-sitter-grammars/tree-sitter-lua | |
| Dart | UserNobody14/tree-sitter-dart | Community-maintained |
| Scala | tree-sitter/tree-sitter-scala | |
| Haskell | tree-sitter/tree-sitter-haskell | |
| OCaml | tree-sitter/tree-sitter-ocaml |
The ocaml
(.ml)
implementation grammar
|
| CSS | tree-sitter/tree-sitter-css | |
| YAML | ikatyang/tree-sitter-yaml |
Parse-only (no extractSymbols support)
|
| TOML | tree-sitter-grammars/tree-sitter-toml | Parse-only |
| Markdown | tree-sitter-grammars/tree-sitter-markdown | Parse-only (block + inline sub-grammars) |
| Elixir | elixir-lang/tree-sitter-elixir | Parse-only |
License¶
The Tree-sitter library is licensed under the MIT License. Individual
grammars are licensed under MIT or comparable permissive licenses by their respective authors.
See the LICENSE
files under each extern/tree-sitter/tree-sitter-<lang>/ directory in the Rampart
source tree.
The rampart-treesitter module is released under the MIT license.
What does it do?¶
The rampart-treesitter module exposes language-aware source code parsing to Rampart JavaScript. It supports twenty-six languages out of the box and provides two access patterns:
- A high-level
extractSymbolsfunction that returns a flat list of top-level definitions (functions, classes, structs, type declarations, and similar constructs) for a given source file. This is the primary tool for building searchable code indexes, navigation tooling, and language-aware recall systems. - A lower-level
parsefunction that returns the entire abstract syntax tree as a nested JavaScript object. Callers may walk this tree to perform custom static analysis, lint checks, refactoring transforms, or any work that does not fit the flat symbol-extraction model.
How does it work?¶
Tree-sitter is a parser-generator framework that produces a concrete syntax tree from source
code. Unlike line-oriented or regex-based parsers, Tree-sitter produces a structured tree that
preserves the grammatical relationships in the source. The library is fault-tolerant: it
returns a usable tree even when the source contains syntax errors, marking the affected regions
with ERROR nodes
rather than aborting.
Each supported language is implemented as a compiled grammar that Tree-sitter links against its
runtime. The grammars are bundled inside rampart-treesitter.so itself, so
users of the module need no external installation.
The extractSymbols
function dispatches to a per-language table of “definition node types” (for example
function_declaration, class_definition, struct_specifier) and walks the
parsed tree, emitting one record for every match.
The parse function
performs no such filtering and returns the full tree as a nested object.
Loading and Using the Module¶
Loading¶
Loading the module is a simple matter of using the require() function:
var ts = require("rampart-treesitter");
Properties¶
languages¶
A read-only Array of String values listing every bundled grammar by name. Any name in this array may be passed as the
languageargument toparse. The subset for whichextractSymbolsis supported is documented per grammar in the Acknowledgment table above; passing a parse-only name (such as"yaml") toextractSymbolswill throw.Example:
var ts = require("rampart-treesitter"); console.log(ts.languages); /* ["javascript", "c", "cpp", "python", "java", "go", "rust", "typescript", "tsx", "csharp", "ruby", "bash", "kotlin", "php", "swift", "lua", "dart", "scala", "haskell", "ocaml", "css", "yaml", "toml", "markdown", "markdown_inline", "elixir"] */
Functions¶
extractSymbols¶
The
extractSymbolsfunction parses a source string with the named grammar and returns a flat list of top-level definitions (functions, classes, methods, struct declarations, type aliases, and other language-specific constructs).Usage:
var ts = require("rampart-treesitter"); var result = ts.extractSymbols(source, language[, options]);Where:
sourceis a String, the source code to be parsed.languageis a String, one of the supported language names returned ints.languages. Passing a grammar that is bundled forparseonly ("yaml","toml","markdown","markdown_inline","elixir") will throw a descriptive error.optionsis an Object with the following optional property:
strict- A Boolean. Whentrue, throws an error if the tree contains anyERRORnode anywhere. Defaultfalse: the function returns whatever symbols could be extracted from the partial tree, and the caller may inspect the returnedhasErrorsflag to know whether the parse was clean.
- Return Value:
An Object with two properties:
symbols- An Array of Object values, one per extracted definition. Each entry has the following shape:
name- A String, the identifier of the symbol. Falls back to"(anonymous)"when no name can be resolved (for example, an anonymous struct in a Ctypedef).kind- A String, the Tree-sitter node type that produced the match (for example,"function_definition","class_declaration","trait_item").line- A Number, the 1-based line on which the symbol begins.column- A Number, the 1-based column.signature- A String, the source text covering the symbol’s node, truncated at the first{or after 256 bytes (whichever comes first). Suitable for display and for indexing.startByte- A Number, the byte offset intosourcewhere the symbol begins.endByte- A Number, the byte offset where the symbol ends.hasErrors- A Boolean,trueif the parsed tree contains anyERRORnode. Whentrue, the symbol list may be incomplete; tree-sitter’s error-recovery may have skipped sections of the source.
parse¶
The
parsefunction returns the entire abstract syntax tree for the given source as a nested JavaScript object. Use this for custom analyses that fall outside the flat symbol-extraction model.Usage:
var ts = require("rampart-treesitter"); var tree = ts.parse(source, language[, options]);Where:
sourceis a String, the source code to be parsed.languageis a String, one of the supported language names returned ints.languages. UnlikeextractSymbols,parseaccepts every bundled grammar, including the parse-only ones.optionsis an Object with the following optional properties:
strict- A Boolean. Whentrue, throws an error if the tree contains anyERRORnode. Defaultfalse.includeText- A Boolean. Whentrue, attaches the source slice spanned by each node as atextproperty. Defaultfalse: callers may reproduce the slice usingsource.substring(node.startByte, node.endByte). Omitted by default to keep the result compact for large files.includeUnnamed- A Boolean. Whentrue, includes anonymous nodes (keywords, punctuation, operators) as children. Defaultfalse: only named (semantic) nodes are returned, which is what most callers want.
- Return Value:
An Object representing the root node, recursively containing its children. Each node has the following shape:
type- A String, the Tree-sitter node type (for example,"program","function_definition","identifier").line- A Number, the 1-based starting line.column- A Number, the 1-based starting column.startByte- A Number, the byte offset where the node begins.endByte- A Number, the byte offset where the node ends.isError- A Boolean,trueif this specific node is anERRORnode.hasError- A Boolean,trueif the subtree rooted at this node contains anyERRORnode.children- An Array of node Object values, the named children of this node (or all children ifincludeUnnamedis set).text- A String, the full source slice covering this node. Present only whenincludeTextistrue.
Examples¶
Extracting Symbols¶
var ts = require("rampart-treesitter");
var src =
"function add(a, b) {\n" +
" return a + b;\n" +
"}\n" +
"class Greeter {\n" +
" greet(name) { return \"hi \" + name; }\n" +
"}\n";
var result = ts.extractSymbols(src, "javascript");
console.log("hasErrors:", result.hasErrors);
result.symbols.forEach(function(s) {
console.log(s.kind, "'" + s.name + "' at line", s.line);
});
/* expected output:
hasErrors: false
function_declaration 'add' at line 1
class_declaration 'Greeter' at line 4
method_definition 'greet' at line 5
*/
Inspecting the AST¶
var ts = require("rampart-treesitter");
var tree = ts.parse("int add(int a, int b) { return a + b; }", "c");
console.log("root type:", tree.type);
console.log("root spans bytes", tree.startByte, "to", tree.endByte);
/* Walk the tree and print every named node */
function walk(node, depth) {
var pad = "";
for (var i = 0; i < depth; i++) pad += " ";
console.log(pad + node.type +
" @ " + node.line + ":" + node.column);
node.children.forEach(function(c) { walk(c, depth + 1); });
}
walk(tree, 0);
Handling Parse Errors¶
By default, both functions accept broken input and surface what they could parse, setting
hasErrors (or
hasError on
parse’s root) so the
caller can detect the partial result:
var ts = require("rampart-treesitter");
var src = "function foo() { /* missing close brace */";
var result = ts.extractSymbols(src, "javascript");
if (result.hasErrors) {
console.log("warning: source has parse errors; results may be incomplete.");
}
console.log("Got", result.symbols.length, "symbols.");
Use strict: true when the caller needs to reject malformed input rather than continue
with a partial parse:
var ts = require("rampart-treesitter");
try {
ts.extractSymbols("function foo() { /* broken */", "javascript",
{ strict: true });
} catch (e) {
console.log("caught:", e.message);
}
Parsing a Configuration File¶
The parse-only grammars (yaml, toml, markdown, markdown_inline, elixir) are not exposed through extractSymbols but are fully
usable through parse. This example extracts the top-level keys of a YAML document:
var ts = require("rampart-treesitter");
var doc =
"name: example\n" +
"version: 1.0\n" +
"tags:\n" +
" - foo\n" +
" - bar\n";
var tree = ts.parse(doc, "yaml", { includeText: true });
/* tree.type === "stream" ; walk into block_node → block_mapping
to find the document's top-level keys */