-
Notifications
You must be signed in to change notification settings - Fork 30
SimpleTable with Record, w/o marker class #82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
bishabosha
wants to merge
17
commits into
com-lihaoyi:main
from
bishabosha:feature/table-named-tuples-new-hierarchy-record3
Closed
SimpleTable with Record, w/o marker class #82
bishabosha
wants to merge
17
commits into
com-lihaoyi:main
from
bishabosha:feature/table-named-tuples-new-hierarchy-record3
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
evidence not to go with plain named tuples further wip work with projecting named tupke type into the record
3 tasks
closing this PR to focus on the original branch |
lihaoyi
pushed a commit
that referenced
this pull request
May 28, 2025
## Abstract In this PR, introduce a new `com.lihaoyi::scalasql-namedtuples` module, supporting only Scala 3.7+, with three main contributions: - `scalasql.namedtuples.SimpleTable` class as an alternative to `Table`. The pitch is that you can use a "basic" case class with no higher-kinded type parameters to represent your models. - `scalasql.namedtuples.NamedTupleQueryable` - provides implicit `Queryable` so you can return a named tuple from a query. - `scalasql.simple` package object, that re-exports `scalasql` package, plus `SimpleTable` and `NamedTupleQueryable` ## Example **Defining a table** ```scala import scalasql.simple.{*, given} case class City( id: Int, name: String, countryCode: String, district: String, population: Long ) object City extends SimpleTable[City] ``` **return named tuples from queries** ```scala val query = City.select.map(c => (name = c.name, population = c.population)) val res: Seq[(name: String, population: Long)] = db.run(query) ``` ## Design This PR manages to introduce `SimpleTable` entirely without needing to change the core library. It is designed around leveraging the new Named Tuples and programmatic structural typing facilities introduced in Scala 3.7. No macros are needed. It was also designed such that any query should be pretty much source compatible if you change from `Table` to `SimpleTable` - (with the exception of dropping `[Sc]` type arguments). Within a query, e.g. `City.select.map(c => ...)` we still need `c` to be an object that has all the fields of `City`, but they need to be wrapped by either `scalasql.Expr[T]` or `scalasql.Column[T]`. With `Table` this is done by the case class having an explicit type parameter (e.g. `City[T[_]](name: T[String]...)`) - so you would just substitute the parameter. Of course with `SimpleTable` the main idea is that you do not declare this `T[_]` type parameter, but the `scalasql.query` package expects it to be there. The solution in this PR is to represent the table row within queries by `Record[City, Expr]`, (rather than `City[Expr]`). `Record[C, T[_]` is a new class, and essentially a structurally typed tuple that extends `scala.Selectable` with a named tuple `Fields` type member, derived mapping `T` over `NamedTuple.From[C]`. `Record` (and `SimpleTable`) still support using a nested case class field to share common columns (with a caveat*). When you return a `Record[C, T]` from a query, you need to still get back a `C`, so `SimpleTable` provides an implicit `Queryable.Row[Record[C, Expr], C]`, which is generated by compiletime derivation (via `inline` methods). ### Implementation To make a simpler diff, `SimpleTable` is entirely defined in terms of `Table`. i.e. here is the signature: ```scala class SimpleTable[C]( using name: sourcecode.Name, metadata0: Table.Metadata[[T[_]] =>> SimpleTable.MapOver[C, T]] ) extends Table[[T[_]] =>> SimpleTable.MapOver[C, T]](using name, metadata0) ``` The `metadata0` argument is expected to be generated automatically from an inline given in `SimpleTableMacros.scala` (I suggest to rename to `SimpleTableDerivation.scala`) `Table[V[_[_]]`, being a higher kinded type, normally expects some `case class Foo[T[_]]`, and fills in various places `V[Expr]` or `V[Column]` in queries, and `V[Sc]` for results. However for `SimpleTable` when `T[_]` is `scalasql.Sc` we want to return `C` and otherwise return this `Record[C, T]` so `MapOver` needs to be a match type: ```scala object SimpleTable { type MapOver[C, T[_]] = T[Internal.Tombstone.type] match { case Internal.Tombstone.type => C // T is `Sc` case _ => Record[C, T] } } ``` (`Tombstone` is used here to try and introduce a unique type that would never be used for any other purpose, i.e. be disjoint in the eyes of the match type resolver - also so we can convince ourselves that if `T` returns `Tombstone` it is probably the identity and not some accident.) See #83 for another approach that eliminates removes the `V[_[_]]` from `Table`, `Insert` and various other places. **Design of `Record`** `Record[C, T[_]]` is implemented as a structural type that tries to wrap the fields of `C` in `T`. It has a few design constraints: - When `C` has a field of type `X` that is a nested Table, the corresponding field in `Record[C, T]` must also be `Record[X, T]`. - when selecting a nested Record, preserve which type (e.g. `Expr` or `Column`) to wrap fields in from the outer level. - simple types in the IDE First decision: `Record` uses a `Fields` type member for structural selection, rather than the traditional type refinements. Why: - can be constructed without macros - internals can be based on `Array` rather than a hash map, - `Fields` derived via `NamedTuple.From[C]` is treated as part of the class implementation, this means you never get a huge refinement type showing up whenever you hover in the IDE. Second decision: how to decide which fields are "scalar" data and which are nested records. Constraints: - previously with `Table`, the only evidence that a field of type `X` is a nested table is implicit evidence of type `Table.ImplicitMetadata[X]` - match types can only dispatch on statically known information, and there is not currently any match type (or `scala.compiletime` intrinsic) that can tell you if there exists an implicit of type `X`. Choices: - [ ] pre-compute the transitive closure of all possible nested fields as a third type argument to record, which in typical cases would be empty - [ ] require that the field have some marker, e.g. `foo: Ref[Foo]`, unclear how much this would be intrusive at each use-site - [x] introduce a marker class (`SimpleTable.Nested`) that the nested case class should extend - this does however prevent using "third party" classes as a nested table The implicit derivation of Metadata also enforces that whenever an implicit metadata is discovered for use as field, the class must extend `SimpleTable.Nested`. ```scala object SimpleTable { // needs to be a class so the match type reducer can "prove disjoint" to various other types. abstract class Nested final class Record[C, T[_]](private val data: IArray[AnyRef]) extends Selectable: /** * For each field `x: X` of class `C` there exists a field `x` in this record of type * `Record[X, T]` if `X` is a case class that represents a table, or `T[X]` otherwise. */ type Fields = NamedTuple.Map[ NamedTuple.From[C], [X] =>> X match { case SimpleTable.Nested => Record[X, T] case _ => T[X] } ] } } ``` ### Alternatives **Why is there `Record[C, T]` and not `ExprRecord[C]` or `ColumnRecord[C]` classes?** This was explored in #83, which requires a large change to the `scalasql.query` package, i.e. a new type hierarchy for `Table` (but makes more explicit in types the boundary between read-only queries, column updates, and results). It's also unclear if it relies upon "hacks" to work. **Why use `Record[C, T]` and not named tuples in queries?** 1. its almost impossible and (expensive when possible) to preserve the mapping that a large named tuple type (with no reference to the original class) should map back to the class after running the query 2. Also would be ambiguous with when you explicitly want to return a named tuple, rather than map back to the table class. 3. Record is a very cheap association directly back to the class it derives from, it also is a compact type if ever needed to be written explicitly, or shown by an IDE. **What is needed to get rid of `Simpletable.Nested`?** lets remind ourselves of the current definition of `SimpleTable`: ```scala class SimpleTable[C]( using name: sourcecode.Name, metadata0: Table.Metadata[[T[_]] =>> SimpleTable.MapOver[C, T]] ) extends Table[[T[_]] =>> SimpleTable.MapOver[C, T]](using name, metadata0) { given simpleTableGivenMetadata: SimpleTable.GivenMetadata[C] = SimpleTable.GivenMetadata(metadata0) } ``` First thing - we determined that the transitive closure of available implicit `SimpleTable.GivenMetadata[Foo]` needs to be added as an argument to `Record`. In #82 we explored this by just precomputing all the field types ahead of time in a macro, so the types would look a bit like `Record[City, Expr, (id: Expr[Long], name: Expr[String], nested: (fooId: Expr[Long], ...))]` which was very verbose. An alternative could be to pass as a type parameter the classes which have a metadata defined. Something like `Record[City, Expr, Foo | Bar]` or `Record[Foo, Expr, Empty.type]`, and modify the `Record` class as such: ```diff -final class Record[C, T[_]](private val data: IArray[AnyRef]) extends Selectable: +final class Record[C, T[_], <TABLES>](private val data: IArray[AnyRef]) extends Selectable: /** * For each field `x: X` of class `C` there exists a field `x` in this record of type * `Record[X, T]` if `X` is a case class that represents a table, or `T[X]` otherwise. */ type Fields = NamedTuple.Map[ NamedTuple.From[C], - [X] =>> X match { - case Nested => Record[X, T] + [X] =>> IsSub[X, <TABLES>] match { + case true => Record[X, T] case _ => T[X] } ] } ``` This could be a sweet spot between verbosity and extensibility to "uncontrolled" third party classes - but it is uncertain who in reality would be blocked by needing to extend `SimpleTable.Nested`. Also it is still to determine the potential impact on performance of compilation times, also the best place to compute this type without causing explosions of implicit searches. **You can see a prototype here: [bishabosha/scalasql#table-named-tuples-infer-nested-tables](https://github.com/bishabosha/scalasql/tree/feature/table-named-tuples-infer-nested-tables)** ## Build changes introduce top level `scalasql-namedtuples` module - publishes as `com.lihaoyi:scalasql-namedtuples_3` - scalaVersion `3.7.0` - sources are located in `scalasql/namedtuples` - depends on module `scalasql("3.6.2")` - so that it can re-export all of scalasql from the `scalasql.simple` package object Also declare `scalasql-namedtuples.test` module - sources in `scalasql/namedtuples/test` - depends on module `scalasql("3.6.2").test`, so the custom test framework can be used to capture test results. ## Testing changes The main approach to testing was to copy test sources that already exist, and convert them to use SimpleTable with otherwise no other changes. Assumptions made when copying: - the majority of existing `scalasql` tests are testing the query translation to SQL, rather than specifically the implementation of Table.Metadata generated by macros. - Since the only difference between using `SimpleTable` and `Table` are the signatures of implicits available, and the implementation of `Table.Metadata`, the test coverage for `SimpleTable` should focus on type checking, and that the fundamentals of TableMetadata are implemented correctly in a "round trip". - so I copied the tests from `scalasql/test/src/ExampleTests.scala`, `scalasql/test/src/datatypes/DataTypesTests.scala` and `scalasql/test/src/datatypes/OptionalTests.scala`, renaming the traits and switching from `Table` to `SimpleTable`, otherwise unchanged. - I also had to copy `scalasql/test/src/ConcreteTestSuites.scala` to `scalasql/namedtuples/test/src/SimpleTableConcreteTestSuites.scala`, commenting out most objects except `OptionalTests` and `DataTypesTests`, which now extend the duplicated and renamed suites. I also renamed the package to `scalasql.namedtuples` - finally I also copied `scalasql/test/src/WorldSqlTests.scala` (to `scalasql/namedtuples/test/src/example/WorldSqlTestsNamedTuple.scala`) to ensure that every example in `tutorial.md` compiles after switching to `SimpleTable`, and also to provide snippets I will include in the `tutorial.md`. - I also renamed a few tests in the duplicates of `OptionalTests.scala` and `DataTypesTests.scala` so that they would generate unique names that can be included in `reference.md`. New tests: - demonstrations of returning Named Tuples from the various `SimpleTableH2Example` tests - `scalasql/namedtuples/test/src/datatypes/LargeObjectTest.scala` to stress test the compiler for large sized classes. - `scalasql/namedtuples/test/src/example/foo.scala` for quick testing of compilation, typechecking etc. - replacement of case class `copy` method with `Record#updates` in `SimpleTableOptionalTests.scala` ## Documentation changes `tutorial.md` and `reference.md` are generated from scala source files and test results in `docs/generateDocs.mill`. I decided that rather than duplicate both `tutorial.md` and `reference.md` for `SimpleTable`, it would be better to avoid duplication, or potential drift, by reusing the original documents, but include specific notes when use of `SimpleTable` or `NamedTupleQueryable` adds new functionality or requires different code. ### tutorial.md To update `tutorial.md` I wrote the new text as usual in `WorldSqlTests.scala`. These texts exclusively talk about differences between the two approaches, such as declaring case classes, returning named tuples, or using the `updates` method on record. To support the new texts, I needed to include code snippets. But like in `WorldSqlTests.scala` I would prefer the snippets to be verified in a test suite. So the plan was to copy `WorldSqlTests.scala` to a new file, update the examples to use `SimpleTable` and include snippets from there. To support including snippets from another file I updated the `generateTutorial` task in `docs/generateDocs.mill`. The change was that if the scanner sees a line `// +INCLUDE SNIPPET [FOO] somefile` in `WorldSqlTests.scala`, then it switches to reading the lines from `somefile` file, looking for the first line containing `// +SNIPPET [FOO]`, then splices all lines of `some file` until it reaches a line containing `// -SNIPPET [FOO]`, then it switches back to reading the lines in `WorldSqlTests.scala`. The main idea is that snippets within `somefile` should be declared in the same order that they are included from `WorldSqlTests.scala`, meaning that the scanner traverses both files from top to bottom once (beginning from the previous position whenever switching back). So to declare the snippets as mention above I copied `WorldSqlTests.scala` to `scalasql/namedtuples/test/src/example/WorldSqlTestsNamedTuple.scala`, replaced `Table` by `SimpleTable` and declared in there the snippets I wanted (and included them from `WorldSqlTests.scala`) . Any other changes (e.g. newlines, indentation etc) are likely due to updating scalafmt. ### reference.md this file is generated by the `generateReference` task in `docs/generateDocs.mill`. It works by formatting the data from `out/recordedTests.json` (captured by running tests with a custom framework) and grouping tests by the suite they occur in. Like with `tutorial.md` I thought it best to only add extra snippets that highlight the differences between the two kinds of table. So first thing to capture the output of simple table tests, in the build I set the `SCALASQL_RECORDED_TESTS_NAME` and `SCALASQL_RECORDED_SUITE_DESCRIPTIONS_NAME` environment variables in the `scalasql-namedtuples.test` module: in this case `recordedTestsNT.json` and `out/recordedSuiteDescriptionsNT.json`. Next I updated the `generateReference` task so that it also includes the recorded outputs from `recordedTestsNT.json`. This task handles grouping of tests and removing duplicates (e.g. the `mysql`, `h2` variants). I made it so that for each Suite e.g. `DataTypes` it find the equivalent suite in the simple table results, and then only include the test names it hadn't seen at the end of that suite. So therefore to include any test result from `SimpleTableDataTypesTests.scala` or `SimpleTableOptionalTests.scala`, it is only necessary to rename an individual test, and it will be appended to the bottom of the relevant group in `reference.md`. For this PR I did this by adding a ` - with SimpleTable` suffix to relevant tests (i.e. the demonstration of nested classes, and the usage of `Record#updates` method)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
compared to the initial PR #81, this trades in dropping the marker class for a more complex implementation where the type of the Expression has to be re-computed on each
.select
,.insert
etc, and passed as a type parameter to the record class. This also relies upon transparent inline and macros to compute the types (IntelliJ issue maybe?)However as far as I know IntelliJ doesn't support match types either so probably a moot point
inside an expression, you have the type
SimpleTable.Record[City, (name : Expr[String], population : Expr[Int], mayor : SimpleTable.Record[Person, (name : Expr[String], age : Expr[Int])])]
, inferred by metals