Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A lot of code compiled on a small change (local variable name change) #1396

Open
OndrejSpanel opened this issue Sep 16, 2024 · 37 comments
Open
Labels
area/over_compilation Zinc picks up more compilation than needed

Comments

@OndrejSpanel
Copy link

OndrejSpanel commented Sep 16, 2024

steps

I am afraid I need help to reproduce the issue in a smaller project. My project is SBT based, Scala 3, several modules, JS + JVM, with about 500 Scala source files.

problem

When I make a small change in my project, like renaming a local variable in a function, a lot of files (>400) is compiled.

expectation

On any change which is internal to a single function and does not affect its interface (parameters or return value) only one file should be compiled.

notes

It is possible something in my sources is contributing to the problem, but given current tools, it is quite hard to learn what exactly is the issue. The complete build log obtained using Level.Debug has over 50 000 lines.

I attach an excerpt obtained using ThisBuild / logLevel := Level.Debug from my sbt build which was triggered after I renamed a local variable from add to toAdd :

[debug] compilation cycle 1
[info] compiling 2 Scala sources to C:\Dev\tempi\jvmCommon\target\scala-3.5.0\classes ...

[debug] compilation cycle 2
[info] compiling 36 Scala sources to C:\Dev\tempi\jvmCommon\target\scala-3.5.0\classes ...

[debug] compilation cycle 3
[info] compiling 50 Scala sources to C:\Dev\tempi\jvmCommon\target\scala-3.5.0\classes ...

[debug] compilation cycle 4
[info] compiling 481 Scala sources and 9 Java sources to C:\Dev\tempi\jvmCommon\target\scala-3.5.0\classes ...

@eed3si9n
Copy link
Member

Did you start seeing this behavior with some specific version sbt (if so could you identify the first version it does this)?

Do you use macros, lots of implicits, or package objects? Normally Zinc won't invalidate if the change is limited to a local variable (a variable inside of a function or a method).

@Friendseeker
Copy link
Member

Friendseeker commented Sep 16, 2024

The complete build log obtained using Level.Debug has over 50 000 lines.

Would you like to share the log? There's still a chance that we can extract something useful out of the log. Ideally also share the entire scala project.

The log hopefully contains reason for invalidation somewhere. This is indeed a peculiar issue as changes such as “after I renamed a local variable from add to toAdd” should not have any impact at all. I still need to double check but I believe Zinc does not grant local variable in a function a name, so none of the name hashes should change.


Also does the log always start with

[debug] compilation cycle 1 [info] compiling 2 Scala sources to C:\Dev\tempi\jvmCommon\target\scala-3.5.0\classes ...

If that is the case, chances are the 2 files are always the same 2 files.

@OndrejSpanel
Copy link
Author

If that is the case, chances are the 2 files are always the same 2 files.

Yes, it always starts this way. The issue is 100% reproducible. For me the logs contain too much information, it is hard for me to digest it and to learn what is going on from it.

I can share you the log privately, if you suggest a suitable means to do so. As for the project, I would be very reluctant to share it, unless there is some NDA about it.

@Friendseeker
Copy link
Member

If that is the case, chances are the 2 files are always the same 2 files.

Yes, it always starts this way. The issue is 100% reproducible. For me the logs contain too much information, it is hard for me to digest it and to learn what is going on from it.

I can share you the log privately, if you suggest a suitable means to do so. As for the project, I would be very reluctant to share it, unless there is some NDA about it.

I see. For the log, if you use discord my discord ID is vertex1003. You can send a google drive link containing the log via discord DM, and after I downloaded the log I will reply to you and you can then immediately disable the google drive link.

If you use other messaging app, tell me the app name and your ID and I will register an account and add you there.

Btw may I share segment of the log relevant to the issue to other Scala tooling contributors for sake of debugging this issue?

@eed3si9n
Copy link
Member

@OndrejSpanel Could you try:

Compile / incOptions ~= { _.withTransitiveSteps(10) }

I noticed that you ran out of transitive steps (default is 3 I think), which explains cycle 4 is super big.

@Friendseeker
Copy link
Member

Friendseeker commented Sep 16, 2024

@OndrejSpanel Could you try:

Compile / incOptions ~= { _.withTransitiveSteps(10) }

I noticed that you ran out of transitive steps (default is 3 I think), which explains cycle 4 is super big.

Going slightly offtopic but should we consider raising the default transitive step to something higher than 3? I was reading source code of Intellij Scala Plugin and noticed their Scala Language Server sets transitive step to 5.

@eed3si9n
Copy link
Member

At the rate that it expanded from 2 to 36 files, I am still guessing it's something to do with macro invalidation, but if this fixes the problem maybe we could consider upping to 6.

@OndrejSpanel
Copy link
Author

I have added Compile / incOptions ~= { _.withTransitiveStep(10) }. With this setting I get:

[debug] compilation cycle 1
[info] compiling 2 Scala sources to C:\Dev\tempi\jvmCommon\target\scala-3.5.0\classes ...
[debug] compilation cycle 2
[info] compiling 1 Scala source to C:\Dev\tempi\jvmCommon\target\scala-3.5.0\classes ...
[debug] compilation cycle 3
[info] compiling 15 Scala sources to C:\Dev\tempi\jvmCommon\target\scala-3.5.0\classes ...

This is much better, but still quite a lot for a renamed local variable.

For the log, if you use discord

I have the logs prepared, discord friendship request sent (username ospanel).

@OndrejSpanel
Copy link
Author

OndrejSpanel commented Sep 17, 2024

With { _.withTransitiveStep(30) } the logs are the same. For repro I rename the same variable between add and toAdd. Sometimes I get similar, but a different logs:

[debug] compilation cycle 1
[info] compiling 2 Scala sources to C:\Dev\tempi\jvmCommon\target\scala-3.5.0\classes ...
[debug] compilation cycle 2
[info] compiling 36 Scala sources to C:\Dev\tempi\jvmCommon\target\scala-3.5.0\classes ...
[debug] compilation cycle 3
[info] compiling 50 Scala sources to C:\Dev\tempi\jvmCommon\target\scala-3.5.0\classes ...

As for macros, I use several libraries which use them quite extensively, namely borer, circe, quicklens and airframe.surface.

@OndrejSpanel
Copy link
Author

With { _.withTransitiveStep(10).withRelationsDebug(true).withApiDebug(true) } there is:

Detected API change because net.gamatron.json.Schema contains a macro definition

This object uses airframe.surface, namely my "light" fork of it (https://github.com/OpenGrabeso/light-surface/).

@OndrejSpanel
Copy link
Author

OndrejSpanel commented Sep 17, 2024

I am trying to create a repro using surface. Can someone help me what to look for, when I see this in the SBT debug output:

[debug] [diff] Detected API change because net.gamatron.json.Schema contains a macro definition.
[debug] [inv]
[debug] [inv] Changes:
[debug] [inv] API Changes: Set(APIChangeDueToMacroDefinition(net.gamatron.json.Schema))
[debug] Invalidating (transitively) by inheritance from net.gamatron.json.Schema...
[debug] Initial set of included nodes: net.gamatron.json.Schema
[debug] Invalidated by transitive inheritance dependency: Set(net.gamatron.json.Schema)
[debug] The following member ref dependencies of net.gamatron.json.Schema are invalidated:
[debug] aaa.bbb.ccc.Ccccc
[debug] aaa.bbb.ccc.Dddd
[debug] aaa.bbb.ccc.ddd.EEEE

I am trying to include various constructs those classes use from Schema, but do not see the issue in my repro so far.

Some things Schema includes, which the classes mentioned as "member ref dependencies" use:

  // macro declaration
  def enumerateSubclassesImpl[T: Type](using Quotes): Expr[List[Surface]] = ???

  inline def enumerateSubclasses[T]: List[Surface] = ${ enumerateSubclassesImpl[T] }

  inline def listDerivedClasses[T]: List[TypeDesc] = enumerateSubclasses[T].map(TypeDesc.apply(_))

  // annotation declaration
  type hiddenType = hidden

  // type declarations
  case class TypeDesc(surface: Surface)

What are "member ref dependencies" and when are they invalidated? I am not even sure why is net.gamatron.json.Schema invalidated on my change, as it does not seem to be used in the file I am changing.

Almost all classes listed in the first step as "by member reference" use listDerivedClasses macro, but not all - some are just simple sealed trait, with no references to Schema at all.

@Friendseeker
Copy link
Member

Friendseeker commented Sep 17, 2024

What are "member ref dependencies" and when are they invalidated?

Member ref dependencies are basically direct reference from a class to another.

class A

class B {
   val a = new A;
}

Then B has a member ref dependency to A as B refers to A.

As for why they are invalidated, there's lots of ways they can be invalidated. One common way is that if a class has macros (either contains a macro call site or macro implementation), then all of its member ref dependents are invalidated, which do appear like what is taking place judging by the appearance of APIChangeDueToMacroDefinition in the log.

I am not even sure why is net.gamatron.json.Schema invalidated on my change, as it does not seem to be used in the file I am changing.

I am looking into it. The debug log should contain the reason but the log is a bit hard to read... Which I guess is an area Zinc can work on.

@OndrejSpanel
Copy link
Author

If you think there is any experiment I could run on my project or any other information you want, just ask. I am very glad you care about the issue and work on solving it. Thanks a lot.

@OndrejSpanel
Copy link
Author

I noticed that you ran out of transitive steps (default is 3 I think), which explains cycle 4 is super big.

Going slightly offtopic but should we consider raising the default transitive step to something higher than 3? I was reading source code of Intellij Scala Plugin and noticed their Scala Language Server sets transitive step to 5.

I confirm _.withTransitiveStep(5) is enough for my project to avoid ~500 files being compiled in the compilation cycle 3, I see only 15-50, same as with _.withTransitiveStep(10).

@OndrejSpanel
Copy link
Author

OndrejSpanel commented Sep 17, 2024

One more observation: I do not even have to rename a local variable. Just editing a comment in (almost) any source is enough.

I see very similar messages as before, only there is 1 Scala source compiled in the compilation cycle 1, not 2:

[debug] compilation cycle 1
[info] compiling 1 Scala source to C:\Dev\tempi\jvmCommon\target\scala-3.5.0\classes ...

[debug] compilation cycle 2
[info] compiling 1 Scala source to C:\Dev\tempi\jvmCommon\target\scala-3.5.0\classes ...

[debug] compilation cycle 3
[info] compiling 15 Scala sources to C:\Dev\tempi\jvmCommon\target\scala-3.5.0\classes ...

[debug] Change APIChangeDueToMacroDefinition(net.gamatron.json.Schema) invalidates 31 classes due to The net.gamatron.json.Schema source file declares a macro.
[debug]   > by transitive inheritance: Set(net.gamatron.json.Schema)

@eed3si9n
Copy link
Member

If the package object gets shaken, it's game over. So my suggestion would be to split the source code as much possible for anything that might invalidate the package object, or maybe just avoid package object altogether.
I think Scala 3 was trying to deprecate it at some point? - https://docs.scala-lang.org/scala3/reference/dropped-features/package-objects.html

@OndrejSpanel
Copy link
Author

Thanks for the tip. I will check what package objects could be involved, I certainly have some present. (Or if the logs tell more about what package object in particular is involved, it will certainly help me). And to be sure I understand: by "shaked" you mean "invalidated"? Still I do not understand why changing a comment or a local variable name shakes anything at all.

@OndrejSpanel
Copy link
Author

When I replaced my most prominent package object by expanding it to top-level definitions instead, the result is even worse, on a comment change I get:

[debug] compilation cycle 1
[info] compiling 19 Scala sources to C:\Dev\tempi\jvmCommon\target\scala-3.5.0\classes ...
[debug] compilation cycle 2
[info] compiling 46 Scala sources to C:\Dev\tempi\jvmCommon\target\scala-3.5.0\classes ...
[debug] compilation cycle 3
[info] compiling 56 Scala sources to C:\Dev\tempi\jvmCommon\target\scala-3.5.0\classes ...

@Friendseeker
Copy link
Member

Friendseeker commented Sep 17, 2024

Thanks for the tip. I will check what package objects could be involved, I certainly have some present. (Or if the logs tell more about what package object in particular is involved, it will certainly help me).

When changing a local variable name in a file contains macro (either implementation or call site), zinc performs very aggressive invalidation since macro expands at callsite.

I still need to think more about what is going on. I believe that Zinc's macro invalidation strategy is that a when a source file containing macro is invalidated, all downstream files (all member ref dependents of the macro containing file) would be invalidated. But I would assume that net.gamatron.json.Schema is an upstream class. It is probably not supposed to be invalidated yet it is somehow invalidated...

And to be sure I understand: by "shaked" you mean "invalidated"? Still I do not understand why changing a comment or a local variable name shakes anything at all.

Shake means invalidated.

@OndrejSpanel
Copy link
Author

OndrejSpanel commented Sep 17, 2024

to split the source code as much possible for anything that might invalidate the package object,

Can you give me some examples of what to avoid?

upstream class.

Downstream / upstream - I am afraid I am not sure yet what it means. In this case Schema is a utility class containing a macro and other things. It should not depend on anything else other than surface library, I think.

In this repo I try to replicate how is it used in the real project - https://github.com/OndrejSpanel/incremental-repro - so far I was not able to reproduce the issue, though. I will continue to add more code, but any hints on what might be the problematic construct would be helpful.

@Friendseeker
Copy link
Member

Downstream / upstream - I am afraid I am not sure yet what it means.

Upstream means the class is relatively higher in the dependency tree. Like if class A depends on class B, then B is upstream relative to A.

In this case Schema is a utility class containing a macro and other things. It should not depend on anything else other than surface library, I think.

Thanks for the info! This is really important information that helps to narrow down possibilities.

@Friendseeker
Copy link
Member

In this repo I try to replicate how is it used in the real project - https://github.com/OndrejSpanel/incremental-repro - so far I was not able to reproduce the issue, though. I will continue to add more code, but any hints on what might be the problematic construct would be helpful.

I will try to see if I can talk to Scala 3 compiler devs. I don't have any idea on what might be the problematic construct but they might have an idea on what constructs may cause the issue.

OndrejSpanel added a commit to OndrejSpanel/IncRepro that referenced this issue Sep 18, 2024
@OndrejSpanel
Copy link
Author

OndrejSpanel commented Sep 18, 2024

I have created a standalone repro - https://github.com/OndrejSpanel/IncRepro. It is about 15 scala source files, referencing each other in various ways, including opaque types and exports, sometimes using unused imports, with some package objects present.

To see the issue:

  • compile
  • change comment located in object WC
  • compile again

The output I see after the second compile:

[IJ]compile
[info] compiling 1 Scala source to C:\Dev\inc-repro\target\scala-3.5.0\classes ...
[info] done compiling
[info] compiling 1 Scala source to C:\Dev\inc-repro\target\scala-3.5.0\classes ...
[info] done compiling
[info] compiling 2 Scala sources to C:\Dev\inc-repro\target\scala-3.5.0\classes ...
[info] done compiling
[success] Total time: 1 s, completed 19. 9. 2024 1:18:44

(I hope it will be useful, I spent more than 10 hours reducing it to this shape.)

@Friendseeker Friendseeker added area/over_compilation Zinc picks up more compilation than needed and removed needs info labels Sep 27, 2024
@OndrejSpanel
Copy link
Author

If there is any progress with this issue, please let me know. Even if it is not resolved, just knowing what kinds of constructs trigger this would immensely help me, as I could refactor my sources accordingly. The long compile times on trivial changes are killing me.

@Friendseeker
Copy link
Member

If there is any progress with this issue, please let me know. Even if it is not resolved, just knowing what kinds of constructs trigger this would immensely help me, as I could refactor my sources accordingly. The long compile times on trivial changes are killing me.

Sorry for lack of communication in the last few days. I plan to spend this entire weekend looking into this issue and will share any intel found as comment.

@Friendseeker
Copy link
Member

Friendseeker commented Sep 28, 2024

Didn't find the root cause yet but did some narrowing down the issue and found out that

val removedProducts: Set[VirtualFileRef] =
lookup.removedProducts(previousAnalysis).getOrElse {
new ParVector(previous.allProducts.toVector)
.filter(p => {
// println(s"removedProducts? $p")
!equivS.equiv(previous.product(p), stamps.product(p))
})
.toVector
.toSet
}

has incorrect result. Just by changing a comment, removedProducts returned 17 entries causing mass invalidation.

Screenshot 2024-09-28 at 2 54 44 PM

More specifically previous.allProducts.toVector contains 43 entries, and for 17 of them, !equivS.equiv(previous.product(p), stamps.product(p)) returns true.


It is also very weird that despite I am also using https://github.com/OndrejSpanel/IncRepro, I am somehow getting different logs than #1396 (comment). For me the log is

[debug] compilation cycle 1
[info] compiling 15 Scala sources to /Users/jiahuitan/IncRepro/target/scala-3.5.0/classes ...

@eed3si9n
Copy link
Member

Sorry for lack of communication in the last few days. I plan to spend this entire weekend looking into this issue and will share any intel found as comment.

ok. Reminder to everyone that

  1. Zinc is an open source software with no warrantees. This means that it's a free sandwich, and any additional ham and cheese should ideally be added by their own user, or purchased via some consultancy who can.
  2. Friendseeker is a student who is studying Computer Science.

If working on Zinc is fun, please by all means keep doing what you're doing, but please do not feel obligated on anything.

@OndrejSpanel
Copy link
Author

[info] compiling 15 Scala sources

I see this when I modify the project using IntelliJ IDE with Compiler based error highlighting. In this mode the IDE interacts with the build system and skews results very much. I have learned hard way that to learn anything useful about the SBT build process, the IDE should not be open, or at least be in Built-in highlighting mode.

When I use notepad to edit the files, I consistently get the results I have shown:

PS C:\Dev\inc-repro> sbt compile
[info] welcome to sbt 1.10.1 (Eclipse Adoptium Java 21.0.3)
[info] loading global plugins from C:\Users\Ondra.sbt\1.0\plugins
[info] loading project definition from C:\Dev\inc-repro\project
[info] loading settings for project inc-repro from build.sbt ...
[info] set current project to IncRepro (in build file:/C:/Dev/inc-repro/)
[info] Executing in batch mode. For better performance use sbt's shell
[info] compiling 1 Scala source to C:\Dev\inc-repro\target\scala-3.5.0\classes ...
[info] compiling 1 Scala source to C:\Dev\inc-repro\target\scala-3.5.0\classes ...
[info] compiling 2 Scala sources to C:\Dev\inc-repro\target\scala-3.5.0\classes ...
[success] Total time: 4 s, completed 29. 9. 2024 0:56:23

I running this on Win11 x64, Java 21.

@Friendseeker
Copy link
Member

Friendseeker commented Sep 28, 2024

Sorry for lack of communication in the last few days. I plan to spend this entire weekend looking into this issue and will share any intel found as comment.

ok. Reminder to everyone that

  1. Zinc is an open source software with no warrantees. This means that it's a free sandwich, and any additional ham and cheese should ideally be added by their own user, or purchased via some consultancy who can.
  2. Friendseeker is a student who is studying Computer Science.

If working on Zinc is fun, please by all means keep doing what you're doing, but please do not feel obligated on anything.

Thanks Eugene for reminding me the facts. While Ondrej is being extremely helpful in terms of investigating the issue and being very nice in general, indeed it does not mean I have an obligation to resolve this ticket by doing a weekend crunch.

I will keep investigate this ticket, but I shall prioritize playing with my cat during the long weekend instead of directing all my energy towards this issue.

@Friendseeker
Copy link
Member

Friendseeker commented Sep 28, 2024

[info] compiling 15 Scala sources

I see this when I modify the project using IntelliJ IDE with Compiler based error highlighting. In this mode the IDE interacts with the build system and skews results very much. I have learned hard way that to learn anything useful about the SBT build process, the IDE should not be open, or at least be in Built-in highlighting mode.

When I use notepad to edit the files, I consistently get the results I have shown:

PS C:\Dev\inc-repro> sbt compile [info] welcome to sbt 1.10.1 (Eclipse Adoptium Java 21.0.3) [info] loading global plugins from C:\Users\Ondra.sbt\1.0\plugins [info] loading project definition from C:\Dev\inc-repro\project [info] loading settings for project inc-repro from build.sbt ... [info] set current project to IncRepro (in build file:/C:/Dev/inc-repro/) [info] Executing in batch mode. For better performance use sbt's shell [info] compiling 1 Scala source to C:\Dev\inc-repro\target\scala-3.5.0\classes ... [info] compiling 1 Scala source to C:\Dev\inc-repro\target\scala-3.5.0\classes ... [info] compiling 2 Scala sources to C:\Dev\inc-repro\target\scala-3.5.0\classes ... [success] Total time: 4 s, completed 29. 9. 2024 0:56:23

I running this on Win11 x64, Java 21.

Thanks. Oh my there are a lot of holes. One is that Compiler based error highlighting ruins class file reproducibility. I used an online file hash generator and found out the class file hashes indeed are different across different runs...

I shall try using Built-in highlighting mode and see if I can reproduce the same logs.

@OndrejSpanel
Copy link
Author

  1. any additional ham and cheese should ideally be added by their own user, or purchased via some consultancy who can.

For information: I have create a post about this on scala-users.

@Friendseeker
Copy link
Member

Friendseeker commented Sep 30, 2024

I had figured out what caused net.gamatron.json.Schema to be invalidated. Here's the invalidation chain.

net.opengrabeso.json.Schema -> net.opengrabeso.json.ASchema
net.opengrabeso.json.ASchema -> net.opengrabeso.engine.tx
net.opengrabeso.engine.tx -> net.opengrabeso.engine.MTTx, 
net.opengrabeso.engine.MT -> net.opengrabeso.engine.MTOpaque, 
net.opengrabeso.engine.MTOpaque -> net.opengrabeso.p.OM, 
net.opengrabeso.p.OM -> net.opengrabeso.p.OMShaders, 
net.opengrabeso.p.OMShaders -> net.opengrabeso.p.state.HM, 
net.opengrabeso.p.state.HM -> net.opengrabeso.p.World, 
net.opengrabeso.p.World -> net.opengrabeso.p.s.SLog$.HDesc, 
net.opengrabeso.p.s.SLog$.HDesc -> net.opengrabeso.p.state.package, 
net.opengrabeso.p.state.package -> net.opengrabeso.p.State, 
net.opengrabeso.p.State -> net.opengrabeso.p.state.WC, 

Basically Zinc has a mechanism called "third order invalidation" introduced by Dale in #1282. Zinc tracks list of all (transitive) classes for which a macro (in this case net.opengrabeso.json.Schema) depends on in a crude manner, and if any of these changes, Zinc invalidates the macro net.opengrabeso.json.Schema as any change to any of these classes may affect the behaviour of the macro.

@OndrejSpanel I don't have access to your entire project, but if possible, breaking any part of the chain should ease the issue.

In long term, we need a better macro invalidation (due to upstream change) mechanism. Bytecode based call graph analysis is one promising candidate. The issue with bytecode based analysis though is that it takes significant effort to implement it. Despite its obvious benefits (much refined third order invalidation & potentially better testQuick), I don't see it being done anytime soon.

Btw welcomes any other idea that third order invalidation can be improved. The best case scenario is if we can come up with some macro invalidation strategy that is almost as good as bytecode analysis but takes significantly less effort to implement. Although no one had ever came up with such strategy, few people were looking into this issue anyways.

@OndrejSpanel
Copy link
Author

if possible, breaking any part of the chain should ease the issue

And it really does. When I move the macro enumerateSubclasses into a separate object SchemaMacros with minimal dependencies, I no longer see the issue.

Thanks for the analysis—I really appreciate it! (I hope you had a great time with your cat.) I'd like to express my gratitude in a more tangible way. I've sent you an invite, so you'll be able to find my contact information if you're interested.

@OndrejSpanel
Copy link
Author

I had figured out what caused net.gamatron.json.Schema to be invalidated. Here's the invalidation chain.

Can you describe how can I analyze logs so that I am able to find such invalidation chain on my own? While breaking this chain helps in the repro, there must be some other quite different type of chain in the real project, as even without any macros at all I get following behaviour on trivial code changes:

[IJ]jvmCommon/compile
[info] compiling 2 Scala sources to C:\Dev\tempi\jvmCommon\target\scala-3.5.1\classes ...
[info] compiling 35 Scala sources and 9 Java sources to C:\Dev\tempi\jvmCommon\target\scala-3.5.1\classes ...
[info] compiling 70 Scala sources and 9 Java sources to C:\Dev\tempi\jvmCommon\target\scala-3.5.1\classes ...

This is indeed a peculiar issue as changes such as “after I renamed a local variable from add to toAdd” should not have any impact at all

Even without a macro I can see in the real project that even trivial changes (changing comment, renaming local variable) cause compilation of at least two Scala source files. Now I could spend another day creating another repro, but perhaps understanding logs and being able to identify the invalidation chains would be more productive?

What makes understanding logs hard for me is that there is a lot of information dumped by sbt / zinc about files / classes which I assume were checked for invalidation, but not invalidated eventually. I consider creating some tool which would parse the logs and show just the invalidation information from them.

@Friendseeker
Copy link
Member

Friendseeker commented Sep 30, 2024

Can you describe how can I analyze logs so that I am able to find such invalidation chain on my own?

When withRelationsDebug(true) is set, Zinc prints the relations between classes, for which we can extract the chain from. However, in practice, I found it to very hard to work with as the relation graph can be large and becomes difficult to trace.

While breaking this chain helps in the repro, there must be some other quite different type of chain in the real project, as even without any macros at all I get following behaviour on trivial code changes

Eugene suggested package object might be causing mass invalidation. Here's the relevant log

[debug] [inv] Package object invalidations: net.gamatron.procas.game.state.package

Try to remove net.gamatron.procas.game.state package object if possible (e.g. replacing it with regular object). This might address the issue.

but perhaps understanding logs and being able to identify the invalidation chains would be more productive?

I do feel like relying on user to know library internal & logs that are tightly coupled with library internal indicates some deeper problem. Fundamentally there's this common sentiment that Scala has poor tooling, and the particular problem we are investigating is merely a special case of it.

While I currently have no intention of doing any bounty & receiving any form of monetary compensation, I do feel like some form of incentive structure is the proper way forward. Ideally Scala Center/Lightbend can hire more people to work on Scala tooling full time (but that's easier said than done). Li Haoyi style bounty also seemed to be quite effective.

@OndrejSpanel
Copy link
Author

OndrejSpanel commented Oct 1, 2024

Part of the issue in the real project is #867 (comment):

My experience seems to be that name hashing does not prevent the invalidation from spreading though. In the example I described above in the sbt project, it did two cycles even though only Foo.scala actually changed and no other file depended on Foo.scala.

My Java files are annotations only (because of RetentionPolicy.RUNTIME). Once any of it is invalidated (which always happens at the moment), they seem to invalidate all classes they are used with.

Once I disabled pipelining, the behavior is much more reasonable, I usually get only 2 compilation cycles, not 3 or more. (Even 2 are too many on a trivial change, I am working on a reproducer, but the progress is slow).

I am not sure why any of the Java files was invalidated in the first place, and so far I am unable to examine the invalidation chains, therefore I cannot really tell.

@Friendseeker
Copy link
Member

My Java files are annotations only (because of RetentionPolicy.RUNTIME). Once any of it is invalidated (which always happens at the moment), they seem to invalidate all classes they are used with.

I believe this is due to #630

Once I disabled pipelining, the behavior is much more reasonable, I usually get only 2 compilation cycles, not 3 or more. (Even 2 are too many on a trivial change, I am working on a reproducer, but the progress is slow).

Yeah unfortunately pipelining can have some surprising interaction with incremental compilation.

// Include all Java sources into each Scala compilation cycle during pipelining.
// In normal compilation without pipelining the product of javac compilation would
// be available in /classes directory, but with pipelining javac compilation is
// deferred till later.
// This means that for each cycle (A.scala), (B.scala, ...), etc all Java sources
// must be included into scalac invocation so the sources that are being recompiled
// get to see the symbols coming from Java.
// See also sbt/zinc#918
val nextChangedSources: Set[VirtualFileRef] =
if (continue && !handler.isFullCompilation && options.pipelining) javaSources
else Set.empty
// Return immediate analysis as all sources have been recompiled
copy(
if (continue && !handler.isFullCompilation) nextInvalidations else Set.empty,
nextChangedSources,
binaryChanges = IncrementalCommon.emptyChanges,
previous = current,
cycleNum = cycleNum + 1,
)
}

I am not sure why any of the Java files was invalidated in the first place, and so far I am unable to examine the invalidation chains, therefore I cannot really tell.

The best thing we can do is to come up with a minimal reproduction right now. Even if we can examine the chain, it can be likely that the chain does not reveal very much. Frequently Zinc's invalidation is not quite as orthodox as explicitly following a chain due to name hashing optimization & various special cases to override name hashing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/over_compilation Zinc picks up more compilation than needed
Projects
None yet
Development

No branches or pull requests

3 participants