Skip to content

SimpleTable with Record, w/o marker class #82

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

bishabosha
Copy link
Contributor

@bishabosha bishabosha commented May 5, 2025

compared to the initial PR #81, this trades in dropping the marker class for a more complex implementation where the type of the Expression has to be re-computed on each .select, .insert etc, and passed as a type parameter to the record class. This also relies upon transparent inline and macros to compute the types (IntelliJ issue maybe?)

However as far as I know IntelliJ doesn't support match types either so probably a moot point

import scalasql.dialects.H2Dialect.*
import scalasql.namedtuples.*
import scalasql.*

case class Person(name: String, age: Int)
object Person extends SimpleTable[Person]()

case class City(name: String, population: Int, mayor: Person)
object City extends SimpleTable[City]()

@main def foo =
  City.select.filter(_.name === "foo")
  City.insert.values(City("foo", 42, Person("bar", 23)))
  City.insert.columns(_.name := "foo")
  City.insert.batched(_.name, _.population, _.mayor.name)(("foo", 42, "bar"), ("baz", 23, "qux"))

inside an expression, you have the type SimpleTable.Record[City, (name : Expr[String], population : Expr[Int], mayor : SimpleTable.Record[Person, (name : Expr[String], age : Expr[Int])])], inferred by metals

@bishabosha
Copy link
Contributor Author

closing this PR to focus on the original branch

@bishabosha bishabosha closed this May 17, 2025
lihaoyi pushed a commit that referenced this pull request May 28, 2025
## Abstract

In this PR, introduce a new `com.lihaoyi::scalasql-namedtuples` module,
supporting only Scala 3.7+, with three main contributions:
- `scalasql.namedtuples.SimpleTable` class as an alternative to `Table`.
The pitch is that you can use a "basic" case class with no higher-kinded
type parameters to represent your models.
- `scalasql.namedtuples.NamedTupleQueryable` - provides implicit
`Queryable` so you can return a named tuple from a query.
- `scalasql.simple` package object, that re-exports `scalasql` package,
plus `SimpleTable` and `NamedTupleQueryable`

## Example

**Defining a table**
```scala
import scalasql.simple.{*, given}

case class City(
    id: Int,
    name: String,
    countryCode: String,
    district: String,
    population: Long
)

object City extends SimpleTable[City]
```

**return named tuples from queries**

```scala
val query = City.select.map(c => (name = c.name, population = c.population))
val res: Seq[(name: String, population: Long)] = db.run(query)
```

## Design

This PR manages to introduce `SimpleTable` entirely without needing to
change the core library. It is designed around leveraging the new Named
Tuples and programmatic structural typing facilities introduced in Scala
3.7. No macros are needed.

It was also designed such that any query should be pretty much source
compatible if you change from `Table` to `SimpleTable` - (with the
exception of dropping `[Sc]` type arguments).

Within a query, e.g. `City.select.map(c => ...)` we still need `c` to be
an object that has all the fields of `City`, but they need to be wrapped
by either `scalasql.Expr[T]` or `scalasql.Column[T]`.

With `Table` this is done by the case class having an explicit type
parameter (e.g. `City[T[_]](name: T[String]...)`) - so you would just
substitute the parameter. Of course with `SimpleTable` the main idea is
that you do not declare this `T[_]` type parameter, but the
`scalasql.query` package expects it to be there.

The solution in this PR is to represent the table row within queries by
`Record[City, Expr]`, (rather than `City[Expr]`).
`Record[C, T[_]` is a new class, and essentially a structurally typed
tuple that extends `scala.Selectable` with a named tuple `Fields` type
member, derived mapping `T` over `NamedTuple.From[C]`.

`Record` (and `SimpleTable`) still support using a nested case class
field to share common columns (with a caveat*).

When you return a `Record[C, T]` from a query, you need to still get
back a `C`, so `SimpleTable` provides an implicit
`Queryable.Row[Record[C, Expr], C]`, which is generated by compiletime
derivation (via `inline` methods).

### Implementation

To make a simpler diff, `SimpleTable` is entirely defined in terms of
`Table`. i.e. here is the signature:

```scala
class SimpleTable[C](
    using name: sourcecode.Name,
    metadata0: Table.Metadata[[T[_]] =>> SimpleTable.MapOver[C, T]]
) extends Table[[T[_]] =>> SimpleTable.MapOver[C, T]](using name, metadata0)
```

The `metadata0` argument is expected to be generated automatically from
an inline given in `SimpleTableMacros.scala` (I suggest to rename to
`SimpleTableDerivation.scala`)

`Table[V[_[_]]`, being a higher kinded type, normally expects some `case
class Foo[T[_]]`, and fills in various places `V[Expr]` or `V[Column]`
in queries, and `V[Sc]` for results. However for `SimpleTable` when
`T[_]` is `scalasql.Sc` we want to return `C` and otherwise return this
`Record[C, T]` so `MapOver` needs to be a match type:

```scala
object SimpleTable {
  type MapOver[C, T[_]] = T[Internal.Tombstone.type] match {
    case Internal.Tombstone.type => C // T is `Sc`
    case _ => Record[C, T]
  }
}
```
(`Tombstone` is used here to try and introduce a unique type that would
never be used for any other purpose, i.e. be disjoint in the eyes of the
match type resolver - also so we can convince ourselves that if `T`
returns `Tombstone` it is probably the identity and not some accident.)

See #83 for another approach
that eliminates removes the `V[_[_]]` from `Table`, `Insert` and various
other places.

**Design of `Record`**

`Record[C, T[_]]` is implemented as a structural type that tries to wrap
the fields of `C` in `T`. It has a few design constraints:
- When `C` has a field of type `X` that is a nested Table, the
corresponding field in `Record[C, T]` must also be `Record[X, T]`.
- when selecting a nested Record, preserve which type (e.g. `Expr` or
`Column`) to wrap fields in from the outer level.
- simple types in the IDE

First decision: `Record` uses a `Fields` type member for structural
selection, rather than the traditional type refinements.
Why:
- can be constructed without macros
- internals can be based on `Array` rather than a hash map,
- `Fields` derived via `NamedTuple.From[C]` is treated as part of the
class implementation, this means you never get a huge refinement type
showing up whenever you hover in the IDE.

Second decision: how to decide which fields are "scalar" data and which
are nested records.
Constraints: 
- previously with `Table`, the only evidence that a field of type `X` is
a nested table is implicit evidence of type `Table.ImplicitMetadata[X]`
- match types can only dispatch on statically known information, and
there is not currently any match type (or `scala.compiletime` intrinsic)
that can tell you if there exists an implicit of type `X`.

Choices:
- [ ] pre-compute the transitive closure of all possible nested fields
as a third type argument to record, which in typical cases would be
empty
- [ ] require that the field have some marker, e.g. `foo: Ref[Foo]`,
unclear how much this would be intrusive at each use-site
- [x] introduce a marker class (`SimpleTable.Nested`) that the nested
case class should extend - this does however prevent using "third party"
classes as a nested table

The implicit derivation of Metadata also enforces that whenever an
implicit metadata is discovered for use as field, the class must extend
`SimpleTable.Nested`.

```scala
object SimpleTable {

  // needs to be a class so the match type reducer can "prove disjoint" to various other types. 
  abstract class Nested

  final class Record[C, T[_]](private val data: IArray[AnyRef]) extends Selectable:
    /**
     * For each field `x: X` of class `C` there exists a field `x` in this record of type
     * `Record[X, T]` if `X` is a case class that represents a table, or `T[X]` otherwise.
     */
    type Fields = NamedTuple.Map[
      NamedTuple.From[C],
      [X] =>> X match {
        case SimpleTable.Nested => Record[X, T]
        case _ => T[X]
      }
    ]
  }
}
```

### Alternatives

**Why is there `Record[C, T]` and not `ExprRecord[C]` or
`ColumnRecord[C]` classes?**

This was explored in #83,
which requires a large change to the `scalasql.query` package, i.e. a
new type hierarchy for `Table` (but makes more explicit in types the
boundary between read-only queries, column updates, and results). It's
also unclear if it relies upon "hacks" to work.

**Why use `Record[C, T]` and not named tuples in queries?**
1. its almost impossible and (expensive when possible) to preserve the
mapping that a large named tuple type (with no reference to the original
class) should map back to the class after running the query
2. Also would be ambiguous with when you explicitly want to return a
named tuple, rather than map back to the table class.
3. Record is a very cheap association directly back to the class it
derives from, it also is a compact type if ever needed to be written
explicitly, or shown by an IDE.

**What is needed to get rid of `Simpletable.Nested`?**

lets remind ourselves of the current definition of `SimpleTable`:
```scala
class SimpleTable[C](
    using name: sourcecode.Name,
    metadata0: Table.Metadata[[T[_]] =>> SimpleTable.MapOver[C, T]]
) extends Table[[T[_]] =>> SimpleTable.MapOver[C, T]](using name, metadata0) {
  given simpleTableGivenMetadata: SimpleTable.GivenMetadata[C] =
    SimpleTable.GivenMetadata(metadata0)
}
```

First thing - we determined that the transitive closure of available
implicit `SimpleTable.GivenMetadata[Foo]` needs to be added as an
argument to `Record`.

In #82 we explored this by
just precomputing all the field types ahead of time in a macro, so the
types would look a bit like `Record[City, Expr, (id: Expr[Long], name:
Expr[String], nested: (fooId: Expr[Long], ...))]` which was very
verbose.

An alternative could be to pass as a type parameter the classes which
have a metadata defined. Something like `Record[City, Expr, Foo | Bar]`
or `Record[Foo, Expr, Empty.type]`, and modify the `Record` class as
such:

```diff
-final class Record[C, T[_]](private val data: IArray[AnyRef]) extends Selectable:
+final class Record[C, T[_], <TABLES>](private val data: IArray[AnyRef]) extends Selectable:
    /**
     * For each field `x: X` of class `C` there exists a field `x` in this record of type
     * `Record[X, T]` if `X` is a case class that represents a table, or `T[X]` otherwise.
     */
    type Fields = NamedTuple.Map[
      NamedTuple.From[C],
-     [X] =>> X match {
-       case Nested => Record[X, T]
+     [X] =>> IsSub[X, <TABLES>] match {
+       case true => Record[X, T]
        case _ => T[X]
      }
    ]
  }
```

This could be a sweet spot between verbosity and extensibility to
"uncontrolled" third party classes - but it is uncertain who in reality
would be blocked by needing to extend `SimpleTable.Nested`. Also it is
still to determine the potential impact on performance of compilation
times, also the best place to compute this type without causing
explosions of implicit searches.
**You can see a prototype here:
[bishabosha/scalasql#table-named-tuples-infer-nested-tables](https://github.com/bishabosha/scalasql/tree/feature/table-named-tuples-infer-nested-tables)**

## Build changes
introduce top level `scalasql-namedtuples` module
- publishes as `com.lihaoyi:scalasql-namedtuples_3`
- scalaVersion `3.7.0`
- sources are located in `scalasql/namedtuples`
- depends on module `scalasql("3.6.2")` - so that it can re-export all
of scalasql from the `scalasql.simple` package object

Also declare `scalasql-namedtuples.test` module
- sources in `scalasql/namedtuples/test`
- depends on module `scalasql("3.6.2").test`, so the custom test
framework can be used to capture test results.

## Testing changes

The main approach to testing was to copy test sources that already
exist, and convert them to use SimpleTable with otherwise no other
changes.

Assumptions made when copying:
- the majority of existing `scalasql` tests are testing the query
translation to SQL, rather than specifically the implementation of
Table.Metadata generated by macros.
- Since the only difference between using `SimpleTable` and `Table` are
the signatures of implicits available, and the implementation of
`Table.Metadata`, the test coverage for `SimpleTable` should focus on
type checking, and that the fundamentals of TableMetadata are
implemented correctly in a "round trip".
- so I copied the tests from `scalasql/test/src/ExampleTests.scala`,
`scalasql/test/src/datatypes/DataTypesTests.scala` and
`scalasql/test/src/datatypes/OptionalTests.scala`, renaming the traits
and switching from `Table` to `SimpleTable`, otherwise unchanged.
- I also had to copy `scalasql/test/src/ConcreteTestSuites.scala` to
`scalasql/namedtuples/test/src/SimpleTableConcreteTestSuites.scala`,
commenting out most objects except `OptionalTests` and `DataTypesTests`,
which now extend the duplicated and renamed suites. I also renamed the
package to `scalasql.namedtuples`
- finally I also copied `scalasql/test/src/WorldSqlTests.scala` (to
`scalasql/namedtuples/test/src/example/WorldSqlTestsNamedTuple.scala`)
to ensure that every example in `tutorial.md` compiles after switching
to `SimpleTable`, and also to provide snippets I will include in the
`tutorial.md`.
- I also renamed a few tests in the duplicates of `OptionalTests.scala`
and `DataTypesTests.scala` so that they would generate unique names that
can be included in `reference.md`.

New tests:
- demonstrations of returning Named Tuples from the various
`SimpleTableH2Example` tests
- `scalasql/namedtuples/test/src/datatypes/LargeObjectTest.scala` to
stress test the compiler for large sized classes.
- `scalasql/namedtuples/test/src/example/foo.scala` for quick testing of
compilation, typechecking etc.
- replacement of case class `copy` method with `Record#updates` in
`SimpleTableOptionalTests.scala`

## Documentation changes

`tutorial.md` and `reference.md` are generated from scala source files
and test results in `docs/generateDocs.mill`.

I decided that rather than duplicate both `tutorial.md` and
`reference.md` for `SimpleTable`, it would be better to avoid
duplication, or potential drift, by reusing the original documents, but
include specific notes when use of `SimpleTable` or
`NamedTupleQueryable` adds new functionality or requires different code.

### tutorial.md

To update `tutorial.md` I wrote the new text as usual in
`WorldSqlTests.scala`. These texts exclusively talk about differences
between the two approaches, such as declaring case classes, returning
named tuples, or using the `updates` method on record. To support the
new texts, I needed to include code snippets. But like in
`WorldSqlTests.scala` I would prefer the snippets to be verified in a
test suite. So the plan was to copy `WorldSqlTests.scala` to a new file,
update the examples to use `SimpleTable` and include snippets from
there.

To support including snippets from another file I updated the
`generateTutorial` task in `docs/generateDocs.mill`. The change was that
if the scanner sees a line `// +INCLUDE SNIPPET [FOO] somefile` in
`WorldSqlTests.scala`, then it switches to reading the lines from
`somefile` file, looking for the first line containing `// +SNIPPET
[FOO]`, then splices all lines of `some file` until it reaches a line
containing `// -SNIPPET [FOO]`, then it switches back to reading the
lines in `WorldSqlTests.scala`.

The main idea is that snippets within `somefile` should be declared in
the same order that they are included from `WorldSqlTests.scala`,
meaning that the scanner traverses both files from top to bottom once
(beginning from the previous position whenever switching back).

So to declare the snippets as mention above I copied
`WorldSqlTests.scala` to
`scalasql/namedtuples/test/src/example/WorldSqlTestsNamedTuple.scala`,
replaced `Table` by `SimpleTable` and declared in there the snippets I
wanted (and included them from `WorldSqlTests.scala`) .

Any other changes (e.g. newlines, indentation etc) are likely due to
updating scalafmt.

### reference.md

this file is generated by the `generateReference` task in
`docs/generateDocs.mill`. It works by formatting the data from
`out/recordedTests.json` (captured by running tests with a custom
framework) and grouping tests by the suite they occur in.

Like with `tutorial.md` I thought it best to only add extra snippets
that highlight the differences between the two kinds of table.

So first thing to capture the output of simple table tests, in the build
I set the `SCALASQL_RECORDED_TESTS_NAME` and
`SCALASQL_RECORDED_SUITE_DESCRIPTIONS_NAME` environment variables in the
`scalasql-namedtuples.test` module: in this case `recordedTestsNT.json`
and `out/recordedSuiteDescriptionsNT.json`.

Next I updated the `generateReference` task so that it also includes the
recorded outputs from `recordedTestsNT.json`. This task handles grouping
of tests and removing duplicates (e.g. the `mysql`, `h2` variants). I
made it so that for each Suite e.g. `DataTypes` it find the equivalent
suite in the simple table results, and then only include the test names
it hadn't seen at the end of that suite.

So therefore to include any test result from
`SimpleTableDataTypesTests.scala` or `SimpleTableOptionalTests.scala`,
it is only necessary to rename an individual test, and it will be
appended to the bottom of the relevant group in `reference.md`. For this
PR I did this by adding a ` - with SimpleTable` suffix to relevant tests
(i.e. the demonstration of nested classes, and the usage of
`Record#updates` method)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant