Skip to content

split token value and token kind #10399

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 48 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
0850133
chore(es/parser): custom lexer
bvanjoi Apr 17, 2025
0917b16
chore: use parser/Lexer rather than lexer/Lexer
bvanjoi Apr 21, 2025
4e68a31
chore(es/parser): split token value and token kind
bvanjoi Apr 21, 2025
3cf2250
chore(es/parser): dont clone token value
bvanjoi Apr 21, 2025
ba6cbba
faster bin/assign op check
bvanjoi Apr 22, 2025
76d2b17
faster is_keyword/is_known_ident
bvanjoi Apr 22, 2025
78e72fb
rm swc_ecma_lexer/TokenKind in lexer state update
bvanjoi Apr 22, 2025
aca7975
rm swc_ecma_lexer/TokenType in lexer state update
bvanjoi Apr 22, 2025
363fde3
fix ci
bvanjoi Apr 22, 2025
74549b4
chore(es/parser): common lexer::comment_buffer
bvanjoi Apr 23, 2025
1919d62
chore(es/parser): common lexer::whiltespace
bvanjoi Apr 23, 2025
2d87795
chore(es/parser): common lexer::LexResult
bvanjoi Apr 23, 2025
4afc449
chore: common lexer::{Char, CharIter, CharExt}
bvanjoi Apr 23, 2025
f93d42e
chore: common syntax
bvanjoi Apr 23, 2025
09306ce
chore: common Context
bvanjoi Apr 23, 2025
7aacfdb
chore: common input::Tokens
bvanjoi Apr 23, 2025
e542337
chore: rm useless input
bvanjoi Apr 23, 2025
4ef8196
chore(es/parser): common lexer:State
bvanjoi Apr 23, 2025
377bcc3
chore(common): rm useless mutable
bvanjoi Apr 24, 2025
a429972
inline methods in TokenKind
bvanjoi Apr 24, 2025
9ad27a9
chore(es/lexer): common utils of lexer, part1
bvanjoi Apr 25, 2025
f3e27dc
chore(es/lexer): common emit_error of lexer
bvanjoi Apr 25, 2025
c1aaeea
chore(es/lexer): common `skip_line_comment`
bvanjoi Apr 25, 2025
32355d8
chore(es/lexer): common `skip_space`
bvanjoi Apr 26, 2025
673ee7f
chore(es/lexer): rm `store_comment`
bvanjoi Apr 26, 2025
dfe9356
chore(es/lexer): common number, part1
bvanjoi Apr 26, 2025
f12f62b
chore(es/lexer): common number, part2
bvanjoi Apr 26, 2025
5feadcc
chore(es/lexer): common `consume_pending_comments`
bvanjoi Apr 26, 2025
0e8619c
chore(es/lexer): common `read_jsx_word`
bvanjoi Apr 26, 2025
bf38bbc
chore(es/lexer): common `read_jsx_str`
bvanjoi Apr 26, 2025
ef31789
chore(es/lexer): common read_token, part1
bvanjoi Apr 27, 2025
273ba98
chore(es/lexer): common read_token, part2
bvanjoi Apr 27, 2025
54e7bb4
chore(es/lexer): common read_token, part3
bvanjoi Apr 27, 2025
13afeb8
chore(es/parser): common `ExprExt`
bvanjoi Apr 27, 2025
41da3f0
chore(es/parser): delete test in lexer/parser
bvanjoi Apr 27, 2025
d484bf7
chore(es/parser): common `with_state`
bvanjoi Apr 27, 2025
a8abef2
chore(es/parser): common buffer
bvanjoi Apr 28, 2025
2a117df
chore(es/parser): common `WithCtx`
bvanjoi Apr 28, 2025
fdc497c
chore(es/lexer): some traits
bvanjoi Apr 28, 2025
c88d167
chore(es/parser): common `emit_error`
bvanjoi Apr 29, 2025
24445d4
chore(es/parser): common helpers
bvanjoi Apr 29, 2025
a46dc6f
chore(es/parser): common `Verifier`
bvanjoi Apr 29, 2025
b68bdf0
chore(es/parser): common `verify_expr`
bvanjoi Apr 29, 2025
ae5d009
chore(es/parser): common `parse_lit`
bvanjoi Apr 29, 2025
f8d098b
chore(es/parser): common `parse_ident_name`
bvanjoi Apr 29, 2025
ec9ce2a
chore(es/parser): common `make_decl_declare`
bvanjoi Apr 29, 2025
1731817
chore(es/parser): common parse private name
bvanjoi Apr 30, 2025
f46cf93
chore(es/parser): common `parse_ident`
bvanjoi Apr 30, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

28 changes: 14 additions & 14 deletions crates/swc_common/src/input.rs
Original file line number Diff line number Diff line change
Expand Up @@ -72,22 +72,22 @@ impl<'a> From<&'a SourceFile> for StringInput<'a> {
}
}

impl Input for StringInput<'_> {
impl<'a> Input for StringInput<'a> {
#[inline]
fn cur(&mut self) -> Option<char> {
fn cur(&self) -> Option<char> {
self.iter.clone().next()
}

#[inline]
fn peek(&mut self) -> Option<char> {
fn peek(&self) -> Option<char> {
let mut iter = self.iter.clone();
// https://github.com/rust-lang/rust/blob/1.86.0/compiler/rustc_lexer/src/cursor.rs#L56 say `next` is faster.
iter.next();
iter.next()
}

#[inline]
fn peek_ahead(&mut self) -> Option<char> {
fn peek_ahead(&self) -> Option<char> {
let mut iter = self.iter.clone();
// https://github.com/rust-lang/rust/blob/1.86.0/compiler/rustc_lexer/src/cursor.rs#L56 say `next` is faster
iter.next();
Expand All @@ -107,7 +107,7 @@ impl Input for StringInput<'_> {
}

#[inline]
fn cur_as_ascii(&mut self) -> Option<u8> {
fn cur_as_ascii(&self) -> Option<u8> {
let first_byte = *self.as_str().as_bytes().first()?;
if first_byte <= 0x7f {
Some(first_byte)
Expand All @@ -123,7 +123,7 @@ impl Input for StringInput<'_> {

/// TODO(kdy1): Remove this?
#[inline]
fn cur_pos(&mut self) -> BytePos {
fn cur_pos(&self) -> BytePos {
self.last_pos
}

Expand All @@ -133,7 +133,7 @@ impl Input for StringInput<'_> {
}

#[inline]
unsafe fn slice(&mut self, start: BytePos, end: BytePos) -> &str {
unsafe fn slice(&mut self, start: BytePos, end: BytePos) -> &'a str {
debug_assert!(start <= end, "Cannot slice {:?}..{:?}", start, end);
let s = self.orig;

Expand Down Expand Up @@ -211,7 +211,7 @@ impl Input for StringInput<'_> {
}

#[inline]
fn is_byte(&mut self, c: u8) -> bool {
fn is_byte(&self, c: u8) -> bool {
self.iter
.as_str()
.as_bytes()
Expand All @@ -238,9 +238,9 @@ impl Input for StringInput<'_> {
}

pub trait Input: Clone {
fn cur(&mut self) -> Option<char>;
fn peek(&mut self) -> Option<char>;
fn peek_ahead(&mut self) -> Option<char>;
fn cur(&self) -> Option<char>;
fn peek(&self) -> Option<char>;
fn peek_ahead(&self) -> Option<char>;

/// # Safety
///
Expand All @@ -251,7 +251,7 @@ pub trait Input: Clone {
/// Returns [None] if it's end of input **or** current character is not an
/// ascii character.
#[inline]
fn cur_as_ascii(&mut self) -> Option<u8> {
fn cur_as_ascii(&self) -> Option<u8> {
self.cur().and_then(|i| {
if i.is_ascii() {
return Some(i as u8);
Expand All @@ -262,7 +262,7 @@ pub trait Input: Clone {

fn is_at_start(&self) -> bool;

fn cur_pos(&mut self) -> BytePos;
fn cur_pos(&self) -> BytePos;

fn last_pos(&self) -> BytePos;

Expand Down Expand Up @@ -293,7 +293,7 @@ pub trait Input: Clone {
/// `c` must be ASCII.
#[inline]
#[allow(clippy::wrong_self_convention)]
fn is_byte(&mut self, c: u8) -> bool {
fn is_byte(&self, c: u8) -> bool {
match self.cur() {
Some(ch) => ch == c as char,
_ => false,
Expand Down
3 changes: 3 additions & 0 deletions crates/swc_ecma_lexer/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,9 @@ swc_ecma_visit = { version = "8.0.0", path = "../swc_ecma_visit" }
swc_malloc = { version = "1.2.2", path = "../swc_malloc" }
testing = { version = "9.0.0", path = "../testing" }

[[example]]
name = "lexer"

[[bench]]
harness = false
name = "lexer"
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
use swc_common::{
errors::{ColorConfig, Handler},
input::StringInput,
sync::Lrc,
FileName, SourceMap,
};
use swc_ecma_parser::{lexer::Lexer, Capturing, Parser, StringInput, Syntax};
use swc_ecma_lexer::{lexer, lexer::Lexer, Syntax};

fn main() {
let cm: Lrc<SourceMap> = Default::default();
Expand All @@ -19,25 +20,15 @@ fn main() {
"function foo() {}".into(),
);

let lexer = Lexer::new(
let l = Lexer::new(
Syntax::Es(Default::default()),
Default::default(),
StringInput::from(&*fm),
None,
);

let capturing = Capturing::new(lexer);

let mut parser = Parser::new_from(capturing);

for e in parser.take_errors() {
e.into_diagnostic(&handler).emit();
}

let _module = parser
.parse_module()
let tokens = lexer(l)
.map_err(|e| e.into_diagnostic(&handler).emit())
.expect("Failed to parse module.");

println!("Tokens: {:?}", parser.input().take());
.expect("Failed to lex.");
println!("Tokens: {tokens:?}",);
}
69 changes: 69 additions & 0 deletions crates/swc_ecma_lexer/src/common/context.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
bitflags::bitflags! {
#[derive(Debug, Clone, Copy, Default)]
pub struct Context: u32 {

/// `true` while backtracking
const IgnoreError = 1 << 0;

/// Is in module code?
const Module = 1 << 1;
const CanBeModule = 1 << 2;
const Strict = 1 << 3;

const ForLoopInit = 1 << 4;
const ForAwaitLoopInit = 1 << 5;

const IncludeInExpr = 1 << 6;
/// If true, await expression is parsed, and "await" is treated as a
/// keyword.
const InAsync = 1 << 7;
/// If true, yield expression is parsed, and "yield" is treated as a
/// keyword.
const InGenerator = 1 << 8;

/// If true, await is treated as a keyword.
const InStaticBlock = 1 << 9;

const IsContinueAllowed = 1 << 10;
const IsBreakAllowed = 1 << 11;

const InType = 1 << 12;
/// Typescript extension.
const ShouldNotLexLtOrGtAsType = 1 << 13;
/// Typescript extension.
const InDeclare = 1 << 14;

/// If true, `:` should not be treated as a type annotation.
const InCondExpr = 1 << 15;
const WillExpectColonForCond = 1 << 16;

const InClass = 1 << 17;

const InClassField = 1 << 18;

const InFunction = 1 << 19;

/// This indicates current scope or the scope out of arrow function is
/// function declaration or function expression or not.
const InsideNonArrowFunctionScope = 1 << 20;

const InParameters = 1 << 21;

const HasSuperClass = 1 << 22;

const InPropertyName = 1 << 23;

const InForcedJsxContext = 1 << 24;

// If true, allow super.x and super[x]
const AllowDirectSuper = 1 << 25;

const IgnoreElseClause = 1 << 26;

const DisallowConditionalTypes = 1 << 27;

const AllowUsingDecl = 1 << 28;

const TopLevel = 1 << 29;
}
}
48 changes: 48 additions & 0 deletions crates/swc_ecma_lexer/src/common/input.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
use swc_common::BytePos;
use swc_ecma_ast::EsVersion;

use super::{context::Context, syntax::Syntax};
use crate::{error::Error, lexer};

/// Clone should be cheap if you are parsing typescript because typescript
/// syntax requires backtracking.
pub trait Tokens<TokenAndSpan>: Clone + Iterator<Item = TokenAndSpan> {
fn set_ctx(&mut self, ctx: Context);
fn ctx(&self) -> Context;
fn syntax(&self) -> Syntax;
fn target(&self) -> EsVersion;

fn start_pos(&self) -> BytePos {
BytePos(0)
}

fn set_expr_allowed(&mut self, allow: bool);
fn set_next_regexp(&mut self, start: Option<BytePos>);

fn token_context(&self) -> &lexer::TokenContexts;
fn token_context_mut(&mut self) -> &mut lexer::TokenContexts;
fn set_token_context(&mut self, _c: lexer::TokenContexts);

/// Implementors should use Rc<RefCell<Vec<Error>>>.
///
/// It is required because parser should backtrack while parsing typescript
/// code.
fn add_error(&self, error: Error);

/// Add an error which is valid syntax in script mode.
///
/// This errors should be dropped if it's not a module.
///
/// Implementor should check for if [Context].module, and buffer errors if
/// module is false. Also, implementors should move errors to the error
/// buffer on set_ctx if the parser mode become module mode.
fn add_module_mode_error(&self, error: Error);

fn end_pos(&self) -> BytePos;

fn take_errors(&mut self) -> Vec<Error>;

/// If the program was parsed as a script, this contains the module
/// errors should the program be identified as a module in the future.
fn take_script_module_errors(&mut self) -> Vec<Error>;
}
Loading
Loading