Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
116 changes: 116 additions & 0 deletions content/blog/2025-05-13-SCIF.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
+++
title = "Implementing Multiple Contracts and Error Messaging For a Compiler for Smart Contracts"
[extra]
bio = """
Noah Schiff is a just-graduated undergrad / CS MEng student interested in applied programming languages and optimizing software engineering.<br>
Kabir Samsi is a third-year undergrad, interested in building programming languages and compilers that can
target new areas.<br>
Stephanie Ma is a first-year MS student interested in PL and compilers.
"""
latex=true
[[extra.authors]]
name = "Noah Schiff"
[[extra.authors]]
name = "Kabir Samsi"
[[extra.authors]]
name = "Stephanie Ma"
+++

## Background

Our project focuses on improving and adding new features to the compiler for [Scif](https://arxiv.org/pdf/2407.01204) – a language for representing smart contracts with secure control flow. SCIF as a programmming language uses information flow and its type system to help to prevent control-flow attacks and improve improve secure smart contracts.

As both a language design and implementation paper, the [SCIF technical report](https://arxiv.org/abs/2407.01204) extensively discusses the SCIF compiler and the correctness and performance of the Solidity code it generates for SCIF programs. As such, its authors hope to eventually publish the compiler as a research artifact similar to what is required for a [PLDI Research Artifact](https://pldi25.sigplan.org/track/pldi-2025-pldi-research-artifacts) or [OOPSLA Artifact](https://2025.splashcon.org/track/splash-2025-oopsla-artifacts). The [ACM discusses different badges](https://www.acm.org/publications/policies/artifact-review-and-badging-current) an artifact can be awarded. The goal for SCIF is that the compiler can be easily setup and run by any open source contributor and that the compiler can easily validate results described in papers.

In our project, we focused on a few primary aspects – adding on the much-desired feature of defining **multiple contracts in one file**, and allowing this to work with our the compiler's current control flow and functionality; improving the quality of error messaging for malformed files; and improving the structure of the compiler's build system.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and allowing this to work with our the compiler's current control flow and functionality

I'm not sure what this means… is there something more specific you can say about the requirements here?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

improving the quality of error messaging for malformed files

Unless I somehow missed it, the rest of your post does not discuss this. Please either omit it from this list (if you did not actually do it) or add a section about it.


The existing compiler is frustrating to setup and finicky to run. Furthermore, for potential contributors, it's challenging to know if they introduce a regression or improvement to the codebase. We aim to improve the experience for users and contributors.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a separate goal from the concrete ones listed above, or is it just context for the goals you have already listed? Please try to keep it specific—it can be confusing to state general goals that aren't clearly tied to specific objectives.


## Implementation and Features

### Multiple Contracts

A significant push involved extending our compiler to be able to properly handle multiple contracts. We implemented both a **parsing** phrase by extending our language's grammar, and then extended SCIF's typechecker and compilation to handle the delicate change with defining multiple contracts in the same file.

**Imports and Multiple Contracts**

SCIF syntax currently enables programmers to define, in a single file, either a **contract** or an **interface** – whose relationship is roughly analagous to that of a class and interface in Java. Interfaces specify signatures within contracts, who can later implement them.

More importantly, interface/contract files currently allow **importing** of definitions from other files – that is to say, other contracts and interfaces. The syntax for this is via including a series of `import` statements at the top of the relevant file. Previously, a given file was isomorphic to a given contract or interface – thus, this would cleanly bring into scope the relevant definition.

An important check earlier was to verify no circular imports – that is to say,
This was verified by building a [topological ordering](https://en.wikipedia.org/wiki/Topological_sorting) of imported file names, and rejecting a program whose imports formed a cycle in the coresponding graph.

Updating this to work with a one-to-many relationship between files and contracts was a new challenge.

**Parsing**

SCIF presently uses [Cup](https://www2.cs.tum.edu/projects/cup/) as its parsing mechanism to define its grammar. Previously we defined a grammar mechanism that would allow for any number of imports, followed by either a single contract definition or a single interface definition – this would be parsed as a `SourceFile`, a superclass of both `ContractFile` and `InterfaceFile`.

Our new infrastructure now parses a single source file importing multiple contracts into a list of `Sourcefiles`.It does so by initially creating a new term, `SourceFiles` without imports defined initially. Subsequently, we then
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sourcefiles -> SourceFiles
Missing space after the "."

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems weird that there is a thing called SourceFile, which sounds singular, but there are actually multiple SourceFiles per actual file?

map all defined imports for a given file onto each of the generated files, for each contract. Notably, while we previously implemented a one-to-one `(Filename, ContractFile)` mapping, we now implement a `(Filename, List<ContractFile>)` which reflects the larger set of contracts being brought into scope.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, should I think it's weird that there are multiple ContractFiles per file? Maybe ContractFile is not such a good name if it's not actually a file anymore…


**Typechecking and Circular Import Validation**

An important step in the typechecking process is ensuring smooth information flow, and ensuring that scoping makes sense. As described above, we ensure that imports logically make sense and do not get stuck, by ensuring a clear ordering of them – via a topological ordering.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An important step in the typechecking process is ensuring smooth information flow, and ensuring that scoping makes sense.

Can you make "smooth" and "makes sense" more specific?


When we initially parse a multi-contract file into a series of files, we map the original file name to the list now containing a series of source files, each of which **still preserves the isomorphic mapping from file to contract**. With each of these files now defined, we are able to now run a modified version of the compiler's existing `buildRoots` algorithm to construct the import tree, which both verifies no cycles and then determines which contracts and interfaces to bring into scope.

To accomodate **state**, we map current contracts to a given environment that can be accessed at any time. While previously this had been done simply via the filename, we extend this now to work with a serialized version of the filename itself, followed by the relevant contract name.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You haven't introduced the concept of "state," so it's not possible to understand what challenges arise here or what your solution might be to them.

What is "a serialized version of the filename itself"? It's a string, so what does it mean to serialize it?


**Compilation**

At the final step of compilation, with code generation, we need to ensure that when we ultimately compile to the target language, Solidity, we don't have excessive mentioning of the same imports. This is only a relevant step once typechecking, and program validation (especially verification of no cyclic imports) has run.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is an "excessive mentioning"? What would this look like, and what problems would it cause?


Though we initially parse and then verify each file separately, this is only for the purpose of intermediate representation and validation – in the target language, we fuse these contracts and their interfaces back together in solidity.

To do so, we mark the 'first' `ContractFile` or `InterfaceFile` defined in our series of source files, based on which was defined first in the original code, with the `firstInFile` tag. Subsequently, in the resultant solidity code, we remove imports for all but the `firstInFile` attribute. Code generation then only uses `firstInFile` to generate the relevant import information, while all subsequently fused files only retain the contracts and interfaces defined within them.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is 'first' in quotes? If you actually want the quotes, please use double quotes (not single quotes).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the goal here. Why is it a good idea to only emit imports for the firstInFile contract? Is the point that a given (actual) file might contain several ContractFiles, but your parser associates the same set of imports with all of them?

If so, it seems like this is masking a deeper problem: contracts are not 1-1 with files anymore. So what you really want here is to separate the concept of a Contract from a ContractFile, and associate the imports with the file itself and not the contract.


Below we show an informal pipeline of the process of going from SCIF $\rightarrow$ Analysis $\rightarrow$ Solidity.

![image-info](./2025-05-13-SCIF/Compiler-Pipeline.png)


### Build System for Research Artifact

SCIF uses [SHErrLoc](https://www.cs.cornell.edu/projects/SHErrLoc/). It is written in Java but uses a different build system and uses different versions of the same dependencies that SCIF uses. Previously, SHErrLoc was duplicated in the repo and not properly linked as a submodule, causing conflicts with compiled bytecode class versions. The build system also did not properly include SHErrLoc, and conflicting versions of CUP could cause sporadic compilation and runtime issues. We work to fix this.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We work to fix this.

What did you do, specifically?


Previously, SCIF had no public reference manual for users. We introduce CI to build and publish a public language reference manual. Additionally, SCIF had no sanity checkers for contributors. We introduce GitHub actions to verify compilation and running of the compiler.

Finally, SCIF is currently only runnable through Gradle. This requires users to checkout the repository (and submodule), install Java and Gradle, and understand how to set up the repository. This seriously hinders the usability of SCIF as a language, as most users of a compiler simply want to run it. We begin work to untangle hardcoded paths to the local filesystem and package the compiler as a reusable, compiled JAR for distribution.
<!-- We additionally are working to improve the build and run time of the compiler. We've began to see small improvements from our better integration of SHErrLoc, andNumerous builtin contracts are recompiled on every execution, and SHErrLoc ShAdditionally, builtin contract files are recompiled every time the compiler runs, which is un -->
The work here is still ongoing, but this will be a top priority and integral to larger SCIF project's continued success.

## Performance, Results and Testing

For multi-contract supports, we wrote 12 tests, ranging from the most simple multi-contract structure, to real-world applications like `Uniswap` with >400 LOC and containing multiple contracts. We also tested complicated import relationship and it compiles successfully without errors. For example,

```
[File 1]
import "file2.scif";
import "file3.scif";
interface B {...}
contract A {...uses B, C, D, E...}

[File 2]
import "file4.scif";
interface C {...uses E...}

[File 3]
interface D {...}

[File 4]
interface E {...}
```

This thing would compile. However, if interface `B` uses `A`, interface `C` uses `B`, or interface `C` uses `D`, the compilation will fail.

## Challenges

Our two biggest challenges can both generally be summarized as having needed to work in a time-crunch, due to changing our project track later in the game; and the difficulties of adding on optimizations onto a codebase which was not entirely our own. We were able to get over the time hurdle by readjusting the scope of the project and working concurrently in a couple of sprints; it was also helpful to divide up tasks based on expertise and familiarity with the SCIF project. Working wtih the codebase was challenging due to the somewhat sparse documentation in places; to this effect, we've added as a part of our goal with this compiler to improve overall documentation of methods going forward.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding on optimizations

I didn't see any actual optimizations here. Do you maybe mean, like, changes, more generically?


## Conclusions

Overall, this was a fascinating project that allowed us to apply a mix of our compilers-related skills and software engineering knowledge to add a few new features and optimizations onto a language. Moreover, it was a great experience getting to work on a DSL targeted at a niche area with unique architecture, with our work having a tangible benefit from a research perspective.

There were portions of our project which are tangentially related, if not directly, to the compiler-optimization and analysis focused portion of 6120 itself; however, our work with parsing, feature design and types most definitely go hand-in-hand with much of the content. Put together, this experience has developed a strong portfolio in working with and improving compilers.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.