Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add print-schema option allow_tables_in_same_query #4500

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

sgoll
Copy link
Contributor

@sgoll sgoll commented Feb 19, 2025

This adds a new config option to print-schema, i.e. allow_tables_in_same_query (in diesel.toml) or --allow-tables-in-same-query (in CLI).

Setting this option to all_tables, or leaving it unset, remains the current behavior of generating a single invocation of the macro allow_tables_to_appear_in_same_query! that lists all tables in the schema.

Setting this option to fk_related_tables changes this behavior: instead of a single invocation, several invocations are issued, one for each group of tables related to each other through foreign keys.

For example, given the following tables with foreign keys:

users (id)
posts (id, user_id -> users(id))
comments (id, post_id -> posts(id))

sessions (id)
transactions (id, session_id -> sessions(id))

cars (id)
bikes (id)

The following macro invocations are generated:

For all_tables (default):

diesel::allow_tables_to_appear_in_same_query!(
    bikes, cars, comments, posts, sessions, transactions, users);

For fk_related_tables:

diesel::allow_tables_to_appear_in_same_query!(comments, posts, users);
diesel::allow_tables_to_appear_in_same_query!(sessions, transactions);

See #4333 for rationale.

@sgoll
Copy link
Contributor Author

sgoll commented Feb 19, 2025

@weiznich When this gets merged, I can also provide a backport for #4490, or for a future release.

@weiznich weiznich requested a review from a team February 21, 2025 09:35
Copy link
Member

@Ten0 Ten0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey!
Thanks a lot for your contribution! 😊
It looks like this will definitely be useful for people who have very disconnected large graphs.

Unfortunately my 166 tables graph is largely connected so that won't help me personally 😅 - I'm curious, how is your use-case that is so different?

Otherwise from an implementation standpoint this looks good 😊 - although I did find 2-3 things that I would tend to change.

Thanks again for working on this!

@@ -280,6 +280,14 @@ pub fn build_cli() -> Command {
.action(ArgAction::Append)
.value_parser(PossibleValuesParser::new(print_schema::DocConfig::VARIANTS_STR)),
)
.arg(
Arg::new("allow-tables-in-same-query")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming:

given the name of this argument, at first glance I would expect that I can do:

--allow-tables-to-appear-in-same-query table1 table2

and that that would merge the connected components of table1 and table2 together for generation.

Considering that this may even be something that makes sense and we might want to add later on, how about calling this allow_tables_to_appear_in_same_query_generation_mode or something along those lines ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct. Initially, I had --allow-tables-to-appear-in-same-query, which suffers from the same problem but seemed to be too verbose to me. Adding a suffix -mode or -generation-mode would make it even longer.

There is precedent for the suffix -config, as in --with-docs-config, so I think I would prefer that over -mode and -generation-mode. What do you think?

The full name then could be --allow-tables-to-appear-in-same-query-config. Still pretty long, but as long as that is okay, I can change it to that.

@@ -0,0 +1,3 @@
[print_schema]
file = "src/schema.rs"
allow_tables_in_same_query = "fk_related_tables"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test should probably be named around that it's testing specifically fk_related_tables, not allow_tables_to_appear_in_same_query itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct. I had this originally, but then I ran into PostgreSQL's limit on the length of database names (63 characters) for the databases created by the test suite.

This may be another hint that the option name --allow-tables-to-appear-in-same-query-config is too long 😉

@sgoll
Copy link
Contributor Author

sgoll commented Feb 22, 2025

Unfortunately my 166 tables graph is largely connected so that won't help me personally 😅 - I'm curious, how is your use-case that is so different?

To be honest, my latest project has, at the moment, only a mediocre amount of 27 tables, so I have not been affected as such by the issue outlined in #4333. As in your case, my set of tables is also largely connected, so the changes here only reduce the number of table pairs from 351 down to 301 (25+2 connected tables).

I have been thinking about how else to mitigate the overly exhaustive set of allowed combinations. When splitting into connected components is not sufficient, maybe we can somehow limit the length of join chains (with the rationale that usually there are no more than, say, a handful of tables involved in a single query).

However, since we do not know beforehand which tables will be joined, and the FK-table graph likely has a rather short maximum path length (less than the expected maximum number of table joins, similar to the small-world experiment), this likely would not help reduce the number of table pairs much, if at all.1

Footnotes

  1. In my current set of 25 connected tables, the maximum path length is 4, i.e. with only four joins each table can reach each other table. So, with such a hypothetical configurable code-generation option I could only reduce the number of generated table pairs if I only ever had to join across three tables. But in my code base, I already have joins across up to five tables.

@Ten0
Copy link
Member

Ten0 commented Feb 22, 2025

However, since we do not know beforehand which tables will be joined, and the FK-table graph likely has a rather short maximum path length (less than the expected maximum number of table joins, similar to the small-world experiment), this likely would not help reduce the number of table pairs much, if at all.

Haha I gave this strategy some thought yesterday as well while looking at the issue and PR and had reached the exact same conclusion 😵‍💫

I also considered doing static analysis on existing files to figure out what tables were used in the same file/function, which while it may work practically for a lot of cases, might not work well with reusable query fragments that may live across multiple files (often via #[dsl::auto_type]). Trying to resolve imports becomes a rather large project that would likely require quite a bit of maintenance, and will always be imperfect if macros are involved, so overall it seems to be a bad complexity/benefit ratio. 😕

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants