Skip to content

Commit d81a83f

Browse files
committed
enh: edits from review
1 parent 57c4182 commit d81a83f

File tree

1 file changed

+8
-8
lines changed

1 file changed

+8
-8
lines changed

_posts/2025-11-18-generative-ai-peer-review.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ LLMs are perceived as helping developers:
3636
* In some cases, simplifying language barriers for participants in open source around the world
3737
* Speeding up everyday workflows
3838

39-
Some contributors also perceive these products as making open source more accessible. However, LLMs also present
39+
Some contributors also believe these products open source more accessible. And for some, maybe they do. However, LLMs also present
4040
unprecedented social and environmental challenges.
4141

4242
### Incorrectness of LLMs and misleading time benefits
@@ -88,20 +88,20 @@ The problem? LLMs sometimes spit out near-exact copies of that training data, bu
8888
Why this matters:
8989

9090
* LLM-generated code may be *substantially similar* to copyrighted training data; sometimes it is identical. Copyright law focuses on how similar your content is compared to the original.
91-
* License conflicts can occur if your package's license (e.g., MIT) is incompatible with code patterns that the LLM learned on such as code licensed as GPL or Apache-2.0.
92-
* * You can't trace what content the LLM learned from (the black box problem); this makes due diligence impossible on your part. You might accidentally commit plagiarism or copyright infringement by using LLM output in your code even if you modify it.
91+
* You can't trace what content the LLM learned from (the black box problem); this makes due diligence impossible on your part. You might accidentally commit plagiarism or copyright infringement by using LLM output in your code even if you modify it.
92+
* License conflicts can occur because of both items above. Read on...
9393

9494
When licenses clash, it gets messy. Say your package uses an MIT license (common in scientific Python), but an LLM outputs Apache-2.0 or GPL code—those licenses aren't compatible. You can't just add attribution to fix it. Technically, you'd have to delete everything and rewrite it from scratch to comply with the licensing requirements.
9595

96-
The reality of all of this is that you can't eliminate this risk of license infringement or plagiarism with current LLM technology. But you can be more thoughtful about how you use the technology.
96+
The reality of all of this is that you can't eliminate this risk of license infringement or plagiarism with current LLM technology. But you can be more thoughtful about how you use the technology.
9797

9898
**What you can do now:**
9999

100-
*Prefer human-edited, transformative outputs you fully understand*
101-
102100
* Be aware that when you directly use content developed by an LLM, there will be inherent license conflicts.
103-
* Be aware that LLM products can potentially return copyrighted code verbatim. **Don't paste LLM outputs directly into your code**. Instead, review, edit, and transform anything an LLM gives you. Consider using [clean-room techniques](https://en.wikipedia.org/wiki/Clean-room_design) to achieve this.
104-
* **Make sure you fully understand the code before using it:** This is actually in your best interest because you can learn a lot about programming by asking an LLM questions and reviewing the output critically.
101+
* Understand and transform code that is returned from a LLM: Don't paste LLM outputs directly. Review, edit, and ensure you fully understand what you're using. You can ask the LLM questions to better understand it's outputs. This approach also helps you learn which addresses the education concerns that we raised earlier.
102+
* **Use LLMs as learning tools**: Ask questions, review outputs critically, then write your own implementation based on understanding. Often the outputs of LLMs are messy or inefficient. Use them to learn, not to copy.
103+
* Consider [clean-room techniques](https://en.wikipedia.org/wiki/Clean-room_design): Have one person review LLM suggestions for approach; have another person implement from that high-level description
104+
* **Document your process**: If you plan to submit a Python package for pyOpenSci review, we will ask you about your use of LLM's in your work. Document the use of LLMs in your project's README file and in any modules with LLM outputs have been applied. Confirm that it has been reviewed by a human prior to submitting it to us, or any other volunteer lead peer review process.
105105

106106
You can't control what's in training data, but you can be thoughtful about how you use these tools.
107107

0 commit comments

Comments
 (0)