Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

object comments may be prohibited in favour of LLM-generated docs #3993

Open
yegor256 opened this issue Mar 11, 2025 · 23 comments
Open

object comments may be prohibited in favour of LLM-generated docs #3993

yegor256 opened this issue Mar 11, 2025 · 23 comments
Labels

Comments

@yegor256
Copy link
Member

yegor256 commented Mar 11, 2025

How about we prohibit comments entirely, except for atoms? Objects must be self-explainable. Later, we can generate documentation for them with the help of LLMs, either live (in IDE) or statically (via eoc docs CLI command).

Atoms, on the other hand, may have comments, because they don't have code sources.

@yegor256
Copy link
Member Author

@maxonfjvipon @volodya-lombrozo thoughts?

@maxonfjvipon
Copy link
Member

@yegor256 so the source code will not contain any documentation at all? Where should I go then if I want to know what the specific object does?

@h1alexbel
Copy link
Member

h1alexbel commented Mar 15, 2025

@yegor256 @maxonfjvipon how about we allow comments (EODocs) on syntax level, but make lints warn about them. If programmer still wants to write a comment, he can +unlint written-eodoc. WDYT?

@yegor256
Copy link
Member Author

@maxonfjvipon we can automatically generate comments inside objectionary/home. Also, we can make our IDE plugin to generate them on-fly.

@volodya-lombrozo
Copy link
Member

@yegor256 I strongly disagree. In most cases, objects should be self-explanatory—this is true. However, quite often, we need to write not about an object's functionality but to answer the question of why an object was implemented this way or why it even exists. None of the solutions will answer this question or generate appropriate documentation. Moreover, some comments might clarify the implementation, such as in the following code:

[] > distance
  [x1 y1 x2 y2] > @
    sqrt
      add
        square (sub x2 x1)
        square (sub y2 y1)

We might have this human-readable comment:

# Euclidean distance formula:
# d = ((x₂ - x₁)² + (y₂ - y₁)²)^0.5
[] > distance
  [x1 y1 x2 y2] > @
    sqrt
      add
        square (sub x2 x1)
        square (sub y2 y1)

Btw, what about PDD puzzles?

@volodya-lombrozo
Copy link
Member

@yegor256 Can we close this one?

@yegor256
Copy link
Member Author

yegor256 commented Mar 22, 2025

@volodya-lombrozo the future is coming: LLMs will be as smart (or smarter) than people. They will be able to understand the code and write even better comments than in your example about (with the "Euclidean distance"). We can ask programmers to write as clean code as necessary for LLM to understand it and write proper text for it. In other words, we'll restrict programmers for the sake of higher readability.

TDD puzzles we can move to metas:

+package foo
+version 1.0.0
+todo #42 This is the text of the puzzle, not directly connected to a line of code, but to the entire file

WDYT?

@volodya-lombrozo
Copy link
Member

@yegor256 I'm afraid this future is still far from reality and might never happen, actually. Comments aren't only about the inability to express a developer's thoughts through code; sometimes, they explain why this code even exists.

No matter how simple, concise, and clear your code may end up being, it’s impossible for code to be completely self-documenting. Comments can never be replaced by code alone.

Code Tells You How, Comments Tell You Why

Code can’t explain why the program is being written, and the rationale for choosing this or that method. Code cannot discuss the reasons certain alternative approaches were taken.

Jef Raskin

I suggest waiting until AI is able to generate this comment, for example. If it can do that, then this feature will be totally reasonable.

What do you think?

@yegor256
Copy link
Member Author

@volodya-lombrozo how about we invite others to this discussion: https://t.me/eolang_org/1549

@MCJOHN974
Copy link

@yegor256 I personally don't believe LLMs are able to write good comment's in ALL situations, so I agree with @volodya-lombrozo about letting user to write comment but warn them that the language don't like human-written comment. However, I think you can prove that we are wrong!

I think you can create a repo, with some really hard-to-understand code (for example for complicated mathematical formulas, optimizations targeting some weird hardware, etc.) and let LLMs write comments for this code. I think once you put complex enough code, and LLMs will generate good enough comments, this will be a nice proof that LLMs are smart enough to ban humans from writing comments.

For code snippets to test LLM commenting skills I have this in my mind:

// We want to disable compiler optimisations in this function, because
// our target is a speciefic zk-vm, where performing 10 additions is faster,
// then performing a multiplication (and compiler optimization will simplify
// this function to just
// ```
// int sum = num * 5;
// return sum + sum;
// ```
// because our zk-vm is new, compiler for it is not smart enough yet, and mostly
// copies behaviour of compiler targeting x86.
// However, we do not want to disable all optimizations, because in some other cases
// they are still useful
#pragma GCC push_options
#pragma GCC optimize ("O0")  
int square(int num) {
    int sum = 0;
    for (int i = 0; i < 10; ++i) {
        sum += num;
    }
    return sum;
}
#pragma GCC pop_options

I think you can put some general design docs near the file with this code snippet (but remember, zk-vm we targeting is so brand new that an LLM have no idea what it is).

I highly doubt an LLM will write good comment explaining why we want disable compiler optimizations in this function. Also, I believe, there will be some engineers who will need to write such function and disable optimizations only in it.

p.s. I think some day your idea will be possible, with next level llm, which scans all slack messages, all issues and PRs discussions, all zoom meetings, it probably will be able to write good comment explaining why we want to disable compiler optimizations for this function, but they definetely can't do it today. So if you disable comments now I think you will make EO a bad choice for anything that have some complicated and not ideal parts.

p.p.s. However, banning non-llm written comments can be a good marketing move for eo, since llms are a popular topic today, so... it can attract some "vibe-coders" to the eo =)

@MCJOHN974
Copy link

One more example of hard-to-llm-generate-comment

// It is a NUMS unspendable bitcoin public key <link to wikipedia post about NUMS>
const NUMS = [1, 2, 3, 4, 5, 6, ... 1]

I think it will be quite hard for an LLM to explain this hardcoded bitcoin public key. You can argue that once I write it not as hardcoded but properly, like this:

[cached]
fn nums() {
let phrase = "nothing up my sleeve".encode();
return hashed_secret = hashlib.sha256(phrase);
}

Then llm can add comment making it easy to understand the code. But imagine this is a library and you don't want add sha256 to your dependency list.

Here I also think current state llms will not be able to generate a comment based on just NUMS variable name (btw NUMS is real and quite popular concept in bitcoin, you can try LLMs on it)

@deemp
Copy link
Member

deemp commented Mar 22, 2025

@yegor256

I believe comments play a crucial role in communicating the current knowledge to future readers. Arguably, the programmer knows better than a current LLM what information is worth putting into comments.

It'd be nice if comments of any length and format were allowed.

Moreover, you may want to implement a feature that is currently missing in the VSCode extension for Haskell - footnotes and their preview on hover.

https://t.me/lobste_rs/58341

Example:

https://gitlab.haskell.org/ghc/ghc/-/blob/master/compiler/Language/Haskell/Syntax.hs?ref_type=heads#L18

https://gitlab.haskell.org/ghc/ghc/-/blob/master/compiler/Language/Haskell/Syntax.hs?ref_type=heads#L45

@yegor256
Copy link
Member Author

yegor256 commented Mar 23, 2025

@MCJOHN974 this is what Claude.ai Sonnet 3.7 gave me:

/**
 * @brief Computes the square of an integer using repeated addition.
 *
 * This function calculates the square of the input number by adding
 * the number to itself exactly 10 times. While this appears to calculate
 * 10 times the input rather than its square, the implementation likely
 * contains an error or serves a specific educational purpose.
 *
 * GCC optimization pragmas are used to disable compiler optimizations
 * for this function:
 * - @c #pragma GCC push_options - Saves the current compiler options
 * - @c #pragma GCC optimize ("O0") - Disables all optimizations
 * - @c #pragma GCC pop_options - Restores previously saved options
 *
 * Disabling optimizations ensures the loop executes exactly as written
 * without being transformed by the compiler (e.g., into a single
 * multiplication operation or completely optimized away). This is often
 * useful for:
 * - Debugging purposes
 * - Educational demonstrations
 * - Benchmarking
 * - Cases where optimizations might interfere with intended behavior
 *
 * @param num The integer to be "squared"
 * @return The result of adding the input to itself 10 times (num × 10),
 *         not the actual square (num × num)
 *
 * @note There appears to be a logical error in this implementation.
 * A correct square function would add the number to itself 'num' times,
 * not a fixed 10 times.
 *
 * @see GCC optimization options: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
 * @see GCC pragma documentation: https://gcc.gnu.org/onlinedocs/gcc/Function-Specific-Option-Pragmas.html
 */
#pragma GCC push_options
#pragma GCC optimize ("O0")  
int square(int num) {
    int sum = 0;
    for (int i = 0; i < 10; ++i) {
        sum += num;
    }
    return sum;
}
#pragma GCC pop_options

Wouldn't this comment push the programmer to making the function code better?

My point is that people usually write imperfect code, then making it "a bit better" via comments. If we prohibit comments, we force people to write perfect code. They will have to write it the way LLM perfectly understands their intent. Until then, they will have to improve the code, again and again.

@GeorgySabaev
Copy link

GeorgySabaev commented Mar 23, 2025

this feels completely redundant, and potentially even harmful
if you insist that every piece of code must be perfectly and unambiguously understandable by an LLM, you therefore insist that it must be just as perfectly and unambiguously understandable by a human, removing the need for the intermediary LLM step entirely
if anything, these comments would only make poorly-written programs even more difficult to read, considering they will now have not only confusing code but also a confident comment of an LLM that was confused by that very same code
a comment-less program either contains all the information needed to understand the logic, or it doesn't; slapping a neural bandaid onto it won't help matters much

@MyNameIsNeXTSTEP
Copy link

MyNameIsNeXTSTEP commented Mar 23, 2025

@yegor256 Glad to see the Invitation for discussion from the eo channel, thnx.

Points so far from opposite opinions as i see it:

  1. People write better and shorter comments than AI
  2. Perfect code cannot be achived by people in any away
  3. Writing better code for LLMs often means making it more iterative than abstract (my personal point)
  4. We can allow comments as docs only in hte beginning of the programm entities, prohibit them inside the code block (inside objects)

And the last is the quiestion like - should a programming language by itself restrict such behavoiir of the developer UX ?

We can say that the language provides instruments, but the programmer and the project decides which to choose from, f.e. allow comments or not, but the language itself may restrict certain smal-things about it, rather making them big and strongly influential, so the audience would shrink down to a very narrowed group of people worldwide, that love EOLang and it's positions.

@MCJOHN974
Copy link

@MCJOHN974 this is what Claude.ai Sonnet 3.7 gave me:

Yea, I see that I forgot to change function name to something more sensible. However, I think this comment from Claude even prove my point I think -- LLM for sure CAN see and highlight in comments that we disabling compiler optimizations here. But engineers probably also can see it from code, and from comments they expect to have answer -- why we disabling this optimizations. And LLM also can answer this question, but for generating such answer it need some bigger scope -- it should follow all slack discussions, all zoom calls, etc etc. I don't know any tool which can do so. And even when such tool will be invented it probably will be under a paywall, so it will be polite to wait until this tool became a standard developer tool such as git or github, before forcing EO users to use this tool.

@MCJOHN974
Copy link

And the last is the quiestion like - should a programming language by itself restrict such behavoiir of the developer UX ?

We can say that the language provides instruments, but the programmer and the project decides which to choose from, f.e. > allow comments or not, but the language itself may restrict certain smal-things about it, rather making them big and strongly > influential, so the audience would shrink down to a very narrowed group of people worldwide, that love EOLang and it's
positions.

If you allow everything you will probably invent C++ but we already have one. I think it is OK when language forces you to avoid some bad practices, but I just think there are a lot of cases where LLM simply can't generate good comment just because it doesn't know everything it should.

@MCJOHN974
Copy link

MCJOHN974 commented Mar 24, 2025

@yegor256 Few more questions about AI generated comments:

Probably in such scenario of AI generated comments we want to make this comments stable, in sense comment generated locally on my machine and in CI will be same. Is it possible with current LLMs? What if, for example, I have open source code, but closed telegram/slack discussion about the project. Then I have two options -- don't give AI access to such chats (and then comments generated by external contributors and internal project maintainers will be different) or, other option will be to not allow AI to see anything external contributor can't see.

If you allow AI generated comment just be different depending on who generated it then it will be quite hard to check in CI that comment was not hand-written.

And even without external contributors -- I can have some local branches which I didn't push to github yet, some code I didn't even add to git, all of this can lead to change of state of my local LLM and different state of my local LLM and the CI one can lead to different comment generated

@MCJOHN974
Copy link

this feels completely redundant, and potentially even harmful
if you insist that every piece of code must be perfectly and unambiguously understandable by an LLM, you therefore insist that it must be just as perfectly and unambiguously understandable by a human, removing the need for the intermediary LLM step entirely
if anything, these comments would only make poorly-written programs even more difficult to read, considering they will now have not only confusing code but also a confident comment of an LLM that was confused by that very same code
a comment-less program either contains all the information needed to understand the logic, or it doesn't; slapping a neural bandaid onto it won't help matters much

What is good about EO here -- it is a brand new language, and it don't have some huge codebases which have bad code quality but still necessary to maintain because it is too expensive to rewrite them from scratch. Thus, while EO is still under development it is possible to force all projects in EO to start with some quality standards and force people to maintain this quality standards.

@MyNameIsNeXTSTEP
Copy link

MyNameIsNeXTSTEP commented Mar 24, 2025

@MCJOHN974

If you allow everything you will probably invent C++ but we already have one. I think it is OK when language forces you to avoid some bad practices, but I just think there are a lot of cases where LLM simply can't generate good comment just because it doesn't know everything it should.

It's true with followng this part: "allow everything" (c), but what i considered is not that, it's the absence of what should lie on the shoulders of developers - final users, on the tool itself.

For example, in python there's an ability (and a propostion if i'm not mistaken in PEPs) to write doc-comments on the first line after the function defenition, and without that the code still maybe in much projects self-explainable.


I see this prohibition of comments entirely could work only in such manne:

  1. Code delivered to a review doesn't contain commetns
  2. Code under a code-review is strongly aimed to be small and clear to understand the largest part of developers in the projects
  3. After a code-review being passed, the job comes to work on the merged code to create a doc-comments on it, the passes it to the review too - this is crucial to control the docs and fix it somewhere.
  4. After all threads are resolved, push docs to the project.

And here on the p.3. we already have, by a well-known processes of strong code-review, that doc-commetns would contain exactly what we need and are agreed upon, so no need then for developers to write comments before the 1st code-review.

@MyNameIsNeXTSTEP
Copy link

@yegor256 Few more questions about AI generated comments:

Probably in such scenario of AI generated comments we want to make this comments stable, in sense comment generated locally on my machine and in CI will be same. Is it possible with current LLMs? What if, for example, I have open source code, but closed telegram/slack discussion about the project. Then I have two options -- don't give AI access to such chats (and then comments generated by external contributors and internal project maintainers will be different) or, other option will be to not allow AI to see anything external contributor can't see.

If you allow AI generated comment just be different depending on who generated it then it will be quite hard to check in CI that comment was not hand-written.

And even without external contributors -- I can have some local branches which I didn't push to github yet, some code I didn't even add to git, all of this can lead to change of state of my local LLM and different state of my local LLM and the CI one can lead to different comment generated

  • Why in the first place you would need to generate comments locally and check them against one which is from CI ?
  • "it will be quite hard to check in CI that comment was not hand-written" - it isn't so, because the CI jon clearly may state what text it generated and output it. So if you'd like to know if the comments are hand-written or CI made, you just go to a CI job and see logs, where only a programm could generate it, and check then the code for it itself, ther's no posibility you could hack that while the job is beind ran.
  • Why would the LLM need to know any outside data except some current documentation and the code, especially if ther're not open-source (?) I mean, the code and current docs, GitHub issues - that's all it should get to work on, otherwise we loose the goal of making it automative and to have self-explanable code, when you need different parts of developers communications except those which are closest to a project in the 1st place.

@MCJOHN974
Copy link

@MyNameIsNeXTSTEP

Why would the LLM need to know any outside data except some current documentation and the code, especially if ther're not > open-source (?) I mean, the code and current docs, GitHub issues - that's all it should get to work on, otherwise we loose the > goal of making it automative and to have self-explanable code, when you need different parts of developers communications > except those which are closest to a project in the 1st place.

Because if LLM don't know any of your zoom/slack discussions there is a lot of cases where it can't write a good comment. Imagine you developed brand new algorithm, which didn't exist when LLM was learning. Your code is a lot of weird from first glance arithmetic operations. How LLM supposed to write comment about what all this operations do and why it leads to result you aiming for? How LLM can explain tradeoffs and decisions you made about them after looking on grafana and discussing at zoom meeting?

If there are already some human written comments explaining it then LLM can do it, but AFAIU, yegor256 whant's to ban humans from writing comments completely

"it will be quite hard to check in CI that comment was not hand-written" - it isn't so, because the CI jon clearly may state what > text it generated and output it. So if you'd like to know if the comments are hand-written or CI made, you just go to a CI job
and see logs, where only a programm could generate it, and check then the code for it itself, ther's no posibility you could
hack that while the job is beind ran.

Tbh I didn't understand this. But my point was, that if CI want to check if comment was LLM generated it need to know LLM state. If we go back to previous question, we see that LLM state should contain some information about slack messages and zoom calls. And there is really a ton of projects where zoom and slack discussions are private, but repo is public. And yes, you can run this CI check privately, but it will be a pain for external contributors

Why in the first place you would need to generate comments locally and check them against one which is from CI ?

And this is a good question =)

Maybe that can work and it is possible to create a work pipeline where human writes code, push it, and then some LLM generate comments and push them to same branch. But... It sounds like "you can not write 100% ready code locally", and you for me it sounds really weird, but I don't have constructive arguments for it atm, so maybe yes, it can work this way

@MCJOHN974
Copy link

@yegor256 Few more questions about AI generated comments:
Probably in such scenario of AI generated comments we want to make this comments stable, in sense comment generated locally on my machine and in CI will be same. Is it possible with current LLMs? What if, for example, I have open source code, but closed telegram/slack discussion about the project. Then I have two options -- don't give AI access to such chats (and then comments generated by external contributors and internal project maintainers will be different) or, other option will be to not allow AI to see anything external contributor can't see.
If you allow AI generated comment just be different depending on who generated it then it will be quite hard to check in CI that comment was not hand-written.
And even without external contributors -- I can have some local branches which I didn't push to github yet, some code I didn't even add to git, all of this can lead to change of state of my local LLM and different state of my local LLM and the CI one can lead to different comment generated

  • Why in the first place you would need to generate comments locally and check them against one which is from CI ?
  • "it will be quite hard to check in CI that comment was not hand-written" - it isn't so, because the CI jon clearly may state what text it generated and output it. So if you'd like to know if the comments are hand-written or CI made, you just go to a CI job and see logs, where only a programm could generate it, and check then the code for it itself, ther's no posibility you could hack that while the job is beind ran.
  • Why would the LLM need to know any outside data except some current documentation and the code, especially if ther're not open-source (?) I mean, the code and current docs, GitHub issues - that's all it should get to work on, otherwise we loose the goal of making it automative and to have self-explanable code, when you need different parts of developers communications except those which are closest to a project in the 1st place.

This pipeline sounds like something that can work and like something what beats my arguments. However, I still believe with current state of LLMs there will be cases where LLM will fail to write a comment and this "review of LLM-written comments" will never lead to merge.

And also it is hard to do a review of code without comments sometimes, you can argue that it is just because code is bad but I'm would not agree. At least something like pdd comments and todo's, I would like to see them during review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants