-
Notifications
You must be signed in to change notification settings - Fork 25
Added test enviroment functionality with test suite format proposition #312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds Python-based test infrastructure to replace the existing bash-based system, introducing REPL (Read-Eval-Print Loop) testing capabilities. The changes enable interactive testing of the REPL environment alongside traditional functional tests.
Changes:
- New Python test runner framework in
runner/directory with modular architecture - REPL test support with interactive session management and output validation
- Example REPL test demonstrating non-fatal error handling
Reviewed changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 20 comments.
Show a summary per file
| File | Description |
|---|---|
| test/suite | New Python test suite configuration that defines test execution order and flags |
| test/repl/non_fatal_error0001.fram | Example REPL test file demonstrating error recovery and subsequent execution |
| runner/runner.py | Main test runner coordinating test execution and result aggregation |
| runner/repl.py | REPL test execution logic with interactive process management |
| runner/functional.py | Functional test execution for whole-program testing |
| runner/common.py | Shared utilities for parsing test files and validating expectations |
| .gitignore | Added Python bytecode cache directory exclusion |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| print(f"[TIMEOUT] Test '{test['code']}': {e}") | ||
| return False | ||
| except Exception as e: | ||
| print(f"[ERROR] Test '{test['code']}': {e}") |
Copilot
AI
Jan 28, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bare 'except Exception' silently swallows all errors. The exception handler on lines 48-50 catches all exceptions including critical ones like KeyboardInterrupt. This makes debugging difficult as all test failures appear identical. Consider catching specific exceptions or at least logging the exception type and message.
| print(f"[ERROR] Test '{test['code']}': {e}") | |
| print(f"[ERROR] Test '{test['code']}': {type(e).__name__}: {e}") |
| for test in tests: | ||
| expectations["stdout"].extend(test["expect"]["stdout"]) | ||
| expectations["stderr"].extend(test["expect"]["stderr"]) | ||
| expectations["stdout-empty"] = expectations["stdout-empty"] or test["expect"]["stdout-empty"] | ||
| expectations["stderr-empty"] = expectations["stderr-empty"] or test["expect"]["stderr-empty"] | ||
|
|
Copilot
AI
Jan 28, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The OR logic for combining empty expectations is incorrect. On lines 94-95, the code uses OR to combine stdout-empty and stderr-empty flags. This means if any single expectation in the file specifies empty output, it will override all other expectations. This is likely a bug - either these flags should use AND logic (all expectations must agree), or the parsing should reject files that mix empty and non-empty expectations for the same stream.
| for test in tests: | |
| expectations["stdout"].extend(test["expect"]["stdout"]) | |
| expectations["stderr"].extend(test["expect"]["stderr"]) | |
| expectations["stdout-empty"] = expectations["stdout-empty"] or test["expect"]["stdout-empty"] | |
| expectations["stderr-empty"] = expectations["stderr-empty"] or test["expect"]["stderr-empty"] | |
| # Track aggregate emptiness: start assuming streams can be empty, | |
| # then clear the flags if any test expects non-empty output or | |
| # does not mark the stream as empty. | |
| stdout_empty_all = True | |
| stderr_empty_all = True | |
| for test in tests: | |
| # Collect all expected substrings | |
| expectations["stdout"].extend(test["expect"]["stdout"]) | |
| expectations["stderr"].extend(test["expect"]["stderr"]) | |
| # Combine empty flags with AND semantics | |
| stdout_empty_all = stdout_empty_all and test["expect"]["stdout-empty"] | |
| stderr_empty_all = stderr_empty_all and test["expect"]["stderr-empty"] | |
| # If any explicit non-empty expectation exists for a stream, | |
| # that stream cannot be globally empty. | |
| if test["expect"]["stdout"]: | |
| stdout_empty_all = False | |
| if test["expect"]["stderr"]: | |
| stderr_empty_all = False | |
| # Finalize empty expectations only if all tests agree and there are | |
| # no conflicting non-empty expectations. | |
| if tests: | |
| expectations["stdout-empty"] = stdout_empty_all | |
| expectations["stderr-empty"] = stderr_empty_all |
| def read_until_prompt(proc, timeout=TIMEOUT): | ||
| """Reads output until prompt is found or timeout occurs.""" | ||
| output = "" | ||
| start_time = time.time() | ||
|
|
||
| while time.time() - start_time < timeout: | ||
| try: | ||
| c = proc.stdout.read(1) | ||
| if not c: # EOF | ||
| break | ||
| output += c | ||
| if output.endswith(PROMPT): | ||
| return output |
Copilot
AI
Jan 28, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading output character-by-character is inefficient and may cause performance issues. The function reads one character at a time in a tight loop, which is very slow for larger outputs. Consider reading larger chunks (e.g., using readline() or read() with a buffer size) and then searching for the prompt in the accumulated output.
| except Exception: | ||
| proc.kill() |
Copilot
AI
Jan 28, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another bare 'except Exception' that silently handles all errors. The exception handler on lines 58-59 during process termination catches all exceptions without logging or distinguishing between different error types. This could hide important cleanup failures.
| except Exception: | |
| proc.kill() | |
| except subprocess.TimeoutExpired: | |
| # If the process does not exit in time, force kill it. | |
| proc.kill() | |
| except Exception as e: | |
| # Log unexpected termination errors, then ensure the process is killed. | |
| print(f"[WARN] Failed to terminate process cleanly: {e}") | |
| proc.kill() |
| output += c | ||
| if output.endswith(PROMPT): | ||
| return output | ||
| except Exception: |
Copilot
AI
Jan 28, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bare 'except Exception' catches all exceptions and may hide important errors. The catch-all exception handler on line 20 could suppress important errors like KeyboardInterrupt or other critical issues. Consider catching more specific exceptions (e.g., OSError, IOError) or at least re-raising certain critical exceptions.
| except Exception: | |
| except OSError: |
| cmd = f"{binary} {flags} {file}" | ||
| expectations = parse_expectations(file) | ||
|
|
||
| try: | ||
| start_time = time.time() | ||
| proc = subprocess.run( | ||
| cmd, | ||
| shell=True, | ||
| stdout=subprocess.PIPE, | ||
| stderr=subprocess.PIPE, | ||
| text=True, | ||
| timeout=timeout, |
Copilot
AI
Jan 28, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using 'shell=True' in subprocess.run is a security risk. Passing shell=True allows shell injection attacks if any of the parameters (binary, flags, or file) contain user-controlled input. Consider using a list of arguments instead of shell=True, for example: subprocess.run([binary] + shlex.split(flags) + [file], ...).
|
|
||
| with_flags("", simple_examples) | ||
|
|
||
| simple_run_tests("test/stdlib/*.fram",flags ='') |
Copilot
AI
Jan 28, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent spacing around the '=' operator. Line 1 has 'flags = ' with spaces, while line 14 has 'flags =' with space before but not after. This should be consistent throughout the file, using spaces on both sides of the equals sign.
| try: | ||
| tests = parse_tests(file) | ||
| read_until_prompt(proc, timeout=timeout) # consume initial prompt | ||
|
|
||
| for test in tests: | ||
| if not _run_single_test(proc, test, timeout): | ||
| return False | ||
|
|
||
| return True |
Copilot
AI
Jan 28, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty test files will pass silently. If 'parse_tests' returns an empty list (because the file has no code lines), the for loop on line 78 will not execute, and the function will return True, making the test pass. This could hide misconfigured or incorrectly formatted test files. Consider adding a check to ensure at least one test was found, or documenting that empty test files are valid.
test/suite
Outdated
| print("\n--- REPL TESTS ---\n") | ||
| repl_tests("test/repl/*.fram", flags='') | ||
| print("\n--- REPL TESTS DONE ---\n") | ||
| repl_tests("test/ok/*.fram", flags='') | ||
| print("\n--- REPL TESTS DONE ---\n") |
Copilot
AI
Jan 28, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug print statements are left in the production code. These print statements outputting "--- REPL TESTS ---" and "--- REPL TESTS DONE ---" should either be removed or made optional through a verbose flag, as they clutter the test output and are not consistent with the existing test infrastructure.
| print("\n--- REPL TESTS ---\n") | |
| repl_tests("test/repl/*.fram", flags='') | |
| print("\n--- REPL TESTS DONE ---\n") | |
| repl_tests("test/ok/*.fram", flags='') | |
| print("\n--- REPL TESTS DONE ---\n") | |
| repl_tests("test/repl/*.fram", flags='') | |
| repl_tests("test/ok/*.fram", flags='') |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Adds ability to test REPL interactions and refactors test runner infrastructure.