Skip to content

Conversation

@chengluyu
Copy link
Member

@chengluyu chengluyu commented Sep 16, 2025

The Updated UCS and UPS Compilation Pipelines

  • Term: If now contains SimpleSplit, which represents splits using nested patterns. Nested patterns are represented by Pattern, which is also used by pattern definitions.
  • Elaborator: Since the elaboration of split is implemented by several mutually recursive methods, I placed them in a trait called SplitElaborator to control the length of Elaborator and maintain modularity.
    • At this step, we report semantic errors that do not require context, such as mixing do and then, as well as some obvious unreachable cases, like having more branches after an else.
  • Lowering is now responsible for most of the work.
    • SimpleSplits in UCS expressions are expanded to Splits using SplitCompiler, then normalized by Normalization, and finally lowered to Block.
    • The naïve compilation (generation of unapply and unapplyStringPrefix is now done by calling SplitCompiler for each PatternDef in Lowering.
    • The efficient compilation (done by Instantiator and Compiler) is invoked by SplitCompiler when it sees @compile annotated on patterns.
    • At this step, we perform all the checks on arguments and parameters.

# Conflicts:
#	hkmc2/shared/src/main/scala/hkmc2/codegen/Block.scala
#	hkmc2/shared/src/main/scala/hkmc2/codegen/Lowering.scala
#	hkmc2/shared/src/main/scala/hkmc2/codegen/js/JSBuilder.scala
#	hkmc2/shared/src/main/scala/hkmc2/semantics/Elaborator.scala
#	hkmc2/shared/src/main/scala/hkmc2/semantics/Resolver.scala
#	hkmc2/shared/src/main/scala/hkmc2/semantics/Term.scala
#	hkmc2/shared/src/main/scala/hkmc2/semantics/ups/NaiveCompiler.scala
#	hkmc2/shared/src/test/mlscript-compile/Runtime.mjs
#	hkmc2/shared/src/test/mlscript/backlog/ToTriage.mls
#	hkmc2/shared/src/test/mlscript/codegen/ClassMatching.mls
#	hkmc2/shared/src/test/mlscript/codegen/MergeMatchArms.mls
#	hkmc2/shared/src/test/mlscript/handlers/NonLocalReturns.mls
#	hkmc2/shared/src/test/mlscript/handlers/RecursiveHandlers.mls
#	hkmc2/shared/src/test/mlscript/ucs/general/LogicalConnectives.mls
#	hkmc2/shared/src/test/mlscript/ups/examples/Record.mls
# Conflicts:
#	hkmc2/shared/src/main/scala/hkmc2/codegen/Block.scala
#	hkmc2/shared/src/main/scala/hkmc2/semantics/Term.scala
#	hkmc2/shared/src/main/scala/hkmc2/semantics/ucs/Desugarer.scala
#	hkmc2/shared/src/main/scala/hkmc2/syntax/Tree.scala
#	hkmc2/shared/src/test/mlscript/backlog/UCS.mls
#	hkmc2/shared/src/test/mlscript/codegen/Do.mls
#	hkmc2/shared/src/test/mlscript/codegen/FieldSymbols.mls
#	hkmc2/shared/src/test/mlscript/syntax/WeirdBrackets.mls
#	hkmc2/shared/src/test/mlscript/ucs/hygiene/HygienicBindings.mls
#	hkmc2/shared/src/test/mlscript/ucs/hygiene/PatVars.mls
#	hkmc2/shared/src/test/mlscript/ucs/patterns/AliasPattern.mls
#	hkmc2/shared/src/test/mlscript/ucs/syntax/NestedOpSplits.mls
#	hkmc2/shared/src/test/mlscript/ups/Future.mls
#	hkmc2/shared/src/test/mlscript/ups/examples/Record.mls
#	hkmc2/shared/src/test/mlscript/ups/parametric/EtaConversion.mls
Copy link
Contributor

@LPTK LPTK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor preliminary comments.

//│ = false

pattern Exponential = ("a" | "b") ~ ("c" | "d") ~ ("e" | "f") // ~ ("g" | "h") ~ ("i" | "j") ~ ("k" | "l") ~ ("m" | "n") ~ ("o" | "p") ~ ("q" | "r") ~ ("s" | "t") ~ ("u" | "v") ~ ("w" | "x") ~ ("y" | "z")
:todo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👀

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code generated by this pattern will grow exponentially. It should be addressed in a future optimized string patterns PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not say as much in a comment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, I did not expect this code to grow exponentially even without compilation 🤔 How come?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, I did not expect this code to grow exponentially even without compilation 🤔 How come?

Because it concatenates many disjunctions. For example, ("a" | "b") ~ ("c" | "d") ~ ("e" | "f") ~ ("g" | "h") ~ ("i" | "j") ~ ("k" | "l") has six disjunctions in a row, resulting in 2^6 = 64 branches.

The branch deduplication only works for the innermost consequents.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch.

Copy link
Contributor

@LPTK LPTK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impressive work, thanks!

Let's finally get this merged ASAP (event hough I have not reviewed everything in detail due to lack of time).

object B
data class C(a)

// Note: It is normalization that caused the following tests to generate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this referring to a past incorrect behavior? Where is the problem, here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now. The current behavior is incorrect. The normalization assumes all classes are mutually exclusive, causing patterns like A & B to be optimized away instead of properly testing both classes.

I clarify the comment in caabab8.

Copy link
Contributor

@LPTK LPTK Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow that's a huge miscompilation issue; should be front and center in the tracking issue!

[comment moved from the wrong place]

@@ -1,4 +1,5 @@
{
"name": "mlscript",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know what caused this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it in this commit: Keep name field in the lockfile stable. The reason is that npm copies the name field from package.json to the lockfile and will use the project folder's name if it's missing.

People might clone this repo to folders with different names and then run npm. So, it's necessary to keep this field to make the lockfile stable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, great! I've had this problem but didn't know it could be fixed this way 👍

@LPTK LPTK requested a review from Copilot November 7, 2025 07:53
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the representation of keyword-based infix operators in the parser by wrapping them in a Keywrd type instead of using raw Keyword.Infix values directly. This provides better type safety and consistency with how prefix keywords are handled.

Key Changes

  • Changed InfixApp constructor to use Keywrd[Keyword.Infix] instead of raw Keyword.Infix
  • Updated the parser to create Keywrd wrappers when constructing InfixApp nodes
  • Modified parse rules to use a more type-safe makeInfixRule approach
  • Added the package.json name field for the npm package
  • Updated numerous test expectations to reflect the new Keywrd wrapper in parse tree output

Reviewed Changes

Copilot reviewed 135 out of 136 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
package.json, package-lock.json Added missing name field to package metadata
hkmc2/syntax/Tree.scala Changed InfixApp to use Keywrd[Keyword.Infix] instead of bare Keyword.Infix
hkmc2/syntax/Parser.scala Updated parser to create Keywrd wrappers for infix keywords
hkmc2/syntax/ParseRule.scala Refactored infix rule generation to use type-safe makeInfixRule approach
hkmc2/semantics/BlockImpl.scala Updated one location constructing InfixApp to use Keywrd
hkmc2DiffTests/.../DiffMaker.scala Changed == to === for consistency
hkmc2DiffTests/.../BbmlDiffMaker.scala Added given clause that was missing
Test files (*.mls) Updated test expectations reflecting the parse tree changes
mlscript-compile/Runtime.mls, Runtime.mjs Renamed MatchResult to MatchSuccess for clarity

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

scrut1 = prevHandlerFrame.nextHandler.handler !== cur.handler;
if (scrut1 === true) {
prevHandlerFrame = prevHandlerFrame.nextHandler;
tmp = runtime.Unit;
Copy link

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value assigned to tmp here is unused.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, thanks Copilot!!!

@LPTK LPTK merged commit ea7dc6e into hkust-taco:hkmc2 Nov 7, 2025
1 check passed
@LPTK LPTK deleted the refactoring-desugarer branch November 7, 2025 08:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants