TODO now!:
- Add JVM function types
- Generate type descriptors as strings
- Read class files and keep a map of function name to signature (copy descriptor strings, skip building them?)
- Semantic opcode generation functions
- Keep track of the stack & locals => Might need to do that at a higher level e.g. while working on the AST/IR
- Call functions generically (no builtin) from the same class file
- Call functions generically (no builtin) from other class files
- Define functions
- Generate stack map tables
- Use latest jvm version
- Makefile/build.sh
- Log memory used in arena
- Write non-trivial program with the API (opcode generation functions)
- Compute class file name
- Naive register (e.g. locals) allocation
- Type checking, no inference
- Local variables
- Comments (skipped by the lexer)
- Local variable mutation
- Function definition
- String literals
- Grouping
- Long
- Refactor/rename stuff
- Add asm operations that does the right thing based on the locals/stack types (e.g. add: iadd | fadd | ladd | dadd)
- Byte
- Short
- Control flow: If
- Logical operator !
- Comparison operators <,<=,>,>=,==,!=
- Logical operators (and, or)
- Control flow: While
- Control flow: Return
- Checks around return
- Move types to the resolver
- Recursion (mutual recursion?)
- Read .class, .jar, files in classpath for stdlib and such - only keep required data, don't read everything in the class path for efficiency
- Read .jmod files
- Add class path CLI option
- Scan known locations for jmod files
- Convert jvm types to kotlin types when reading .class, .jar, .jmod
- Use scratch arena when reading .class, .jar, .jmod files
- Merge functions to read .jmod, .jar if possible
- Avoid duplicating method resolution in the resolver and the lowerer (descriptor)
- Heap dump on Linux, tracking of call stack during allocations.
- Default imports
- Log the file/line of the function that was resolved
- Resolve free functions by building candidate sets
- Trivial inline (no jumps, no exceptions, etc)
- Remove builtin println
- Split
string_tintostring_builder_tandstring_t(immutable, allows for trivial equality comparison with interning) - Use a pg_array in the constant pool
- Heap dump on Linux with function names (instead of addresses)
- Constant pool deduplication
- Hash every string in ty_type_t
- Move
resolver->typesto a hash trie - Decode UCS-2 Strings in class files (in constant pool)
- Field access
- Explicit casts
- Char
- Double, Float
- Control flow: Continue
- Control flow: Break
- Control flow: For (?)
- Control flow: When
- Control flow: Do-while
- Multiple files - what about ordering and type hole filling?
- Defend against integer overflows
- Hex/other number literals
- Hashes/Hashtables in judicious places in the compiler (strings, types?)
- Heap dump on other OSes
- Heap dump on other OSes with function names (instead of addresses)
-
Union/intersection of integer types for integer literals => constraint solver for type inference inside a function body! - Package name
- Imports
- Replace all
pg_assert(i.e.__builtin_trap()) by either:- A user-friendly assert that prints the file, line, backtrace, error message, bug report link, and expression that failed (maybe even a core dump?)
- A Kotlin compile error (e.g. for syntax that is not yet supported or invalid jar/jmod/class files)
- Fuzz (especially jar/jmod/class files)
- Call Fully Qualified Name (FQN)
- Call class constructor
- Call class method
- Access class field
- Class definition (BIG!)
- Fields
- Primary constructor
- Secondary constructor
- Methods
- Static methods
- Static fields
- Basic static analysis
- Unused variables
- Unreachable code (might require SSA/CFG?)
- Mutable variables read from but never written to
- Endless recursion
- Redundant conditions e.g. Byte > 128
- Redundant if-then-else branches e.g.
if (false) 1 else if (true) 2 else 3 - All paths return a value in a function
- Switch (
when): All cases are covered - Switch (
when): No redundant cases - ...
Later:
- Do not hold on constant pool strings from .jmod/.class/.jar files that are not useful (e.g. used for CONSTANT_POOL_KIND_CLASS_INFO, etc) to reduce memory usage
- Heap dump as pprof format (?)
- Read kotlin metadata in class files (protobuf)
- Full-fledged type inference
- High level APIs for the driver
- Generate line tables
- Generate full debug information
- Generate exceptions table
- Out-of-order declarations
- Bit operators
- Interfaces
- Using generics (BIG!)
- Defining generics (BIG!)
- Nullability checks
- Output jar file with all the classes inside
Probably much later (not necessary for a MVP):
- Infix functions
- Kdoc in comments
- Type flow (e.g.
if (x is String) x + "foo") - Unicode identifiers
- Function names with spaces and backticks
- Ranges
- Vararg
- Tailrec
- Operator overloading
- Data class
- Raw (multiline) strings
- Nested (interpolation) strings
- Property delegate
- Lazy/lateinit
- Multi-threading stuff (volatile, synchronized, etc)
- Annotation
- Complicated OOP stuff (companion objects, singleton, extension methods, etc)
- Async stuff (suspend, etc)
- Java <-> Kotlin interop e.g. @JvmName, etc
- Runtime reflection
- Maybe: multi thread the compiler
- Maybe: implement/vendor libzip
- Optimize size of
allocation_metadata_t
mvandbvare versions which are not interesting.korkindis an enum value. 1: Class, 2: File.d1contains protobuf encoded data:- Length-prefixed
StringTableTypes: list of records and list of local names.predefined_indexin a Record is an index in the listPREDEDEFINED_STRINGSinside the kotlin compiler, e.g.8iskotlin.Int.
- Depending on
k:- If
kis 1:Class. - If
kis 2:Package.Packagecontains a list of functions. Each function has anamefield which is an index into thed2array of strings (?) and a return type whose fieldclass_nameis an index in the string table types (?).
- If
- Length-prefixed
d2: Array of strings e.g. function names.
- Defined in
libraries/stdlib/jvm/src/kotlin/io/Console.ktare public inline functions with the annotation@kotlin.internal.InlineOnly. - Compiled to private static final functions (thanks to the annotation?) on the class
ConsoleKtwith the runtime invisible annotation:kotlin.interal.InlineOnly. - Thus cannot be used from Java as-is e.g.
kotlin.io.ConsoleKt.println(3);. - Can be used from Kotlin with the compiler copying ('inlining') the code when calling
kotlin.io.println(3). Thus there is noConsoleKtclass for the kotlinc compiler andkotlin.io.ConsoleKt.println(3)does not work. - The compiler has a special case for
@InlineOnlyannotated functions, which are private in the bytecode but are considered public in kotlin code.
- Class file size
- Optimizing generated code (for now, although it could be fun and there are lots of hanging fruits, e.g. constant propagation and Control Flow Graph (CFG)).
- Smartness
- Non JVM backends
Primary target audience: developers with medium to large projects that are slow to compile. Secondary target audience: developers using Kotlin but not Intellij who need good CLI tooling (later: formatting, LSP).
- Fast compile times (ideally < 1s for small to medium projects, < 10s for large projects). Target: 1M LOC/s.
- Fast import times (of bytecode). Target: 1G/s (with SSD).
- Generated code speed and size are within 2-10x of the code generated by the official compiler.
- Understable error messages, possibly great ones
- Small executable size for efficient CI downloading
- No dependencies. Possible exception: libzip to read jar files, but linked statically.
- Major platforms supported (including Windows :| )
- 'Dumb' codebase
$ java -Xlog:verification=trace DemoKt