-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
split token value and token kind #10399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
CodSpeed Performance ReportMerging #10399 will degrade performances by 3.23%Comparing Summary
Benchmarks breakdown
|
2c80e1a
to
c2b1b12
Compare
The performance gains demonstrate a compelling case for implementation. Let me explain my idea:
@kdy1 Do you agree with these changes? If so, I will fix CI and make it ready for review. |
I have seen the two lexers and two parsers in |
Yes, I think so. Perhaps we can find a better way to integrate it. |
Why do we need two separate parser/lexer/token types? |
The Additionally, the
Since the lexical analysis implementations differ,
Using a single parser to handle different lexical logic introduces unnecessary development complexity. While I've separated them for now, I believe there should be a better way to consolidate this logic. |
I'm not sure why |
I guess this is the root cause of the splitting. But I'm not too clear about it. Could you please provide more detailed examples? |
|
Since both of |
I understood, but I think the gain is too small if we have to maintain two tokens/lexers/parsers. My guess is that #10397 is the sweet point, but not sure. |
The
I suggest considering landing this change for three reasons:
|
I'm not too clear about
Do you mean copy the new Lexer and Parser, optimize on top of the new ones, and replace the original ones once they are completed? If not, why can't do so? |
Consider the tokenization of
The primary purpose of (Alternatively, we might consider eventually removing
I'm having difficulty understanding the intended meaning here. Could you please clarify? |
Learnt, and I agree this is a very good idea.
I see. Does this mean
Like |
Certainly not - the current token and parser state combination is sufficient for producing a correct AST. This approach aligns with many established parser implementations, including TypeScript's.
I agree with this perspective in principle, but I don't believe creating a separate crate would actually reduce the workload. My primary goal is to optimize To summarize these changes:
|
Ok, I got the idea and I like it. Would it be possible to use a single struct with const generic or generic to make maintenance easy? I'm worried about the maintenance burden. |
Agree. I expect finally we should eliminate all duplication and get a clear lexer and a clear parser (just like now, and faster), but we should still keep the code in the process as maintainable as possible, which also bring less burden for bug fixing. |
If it's inevitable at the moment, I think we can merge this PR as is and refactor these with other PRs, but I think we should at least have concrete plan for future deduplication. |
I am also worried about the maintenance cost: At present, it seems troublesome to abstract
What do you think? |
It sounds acceptable for now, but I want to hear opinion from @CPunisher, too. |
767aa63
to
be3e65f
Compare
This constitutes a substantial code revision. We should first evaluate whether it yields measurable performance enhancements.