Skip to content

Conversation

@Xaro8
Copy link

@Xaro8 Xaro8 commented Jan 28, 2026

Adds ability to test REPL interactions and refactors test runner infrastructure.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds Python-based test infrastructure to replace the existing bash-based system, introducing REPL (Read-Eval-Print Loop) testing capabilities. The changes enable interactive testing of the REPL environment alongside traditional functional tests.

Changes:

  • New Python test runner framework in runner/ directory with modular architecture
  • REPL test support with interactive session management and output validation
  • Example REPL test demonstrating non-fatal error handling

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 20 comments.

Show a summary per file
File Description
test/suite New Python test suite configuration that defines test execution order and flags
test/repl/non_fatal_error0001.fram Example REPL test file demonstrating error recovery and subsequent execution
runner/runner.py Main test runner coordinating test execution and result aggregation
runner/repl.py REPL test execution logic with interactive process management
runner/functional.py Functional test execution for whole-program testing
runner/common.py Shared utilities for parsing test files and validating expectations
.gitignore Added Python bytecode cache directory exclusion

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

print(f"[TIMEOUT] Test '{test['code']}': {e}")
return False
except Exception as e:
print(f"[ERROR] Test '{test['code']}': {e}")
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bare 'except Exception' silently swallows all errors. The exception handler on lines 48-50 catches all exceptions including critical ones like KeyboardInterrupt. This makes debugging difficult as all test failures appear identical. Consider catching specific exceptions or at least logging the exception type and message.

Suggested change
print(f"[ERROR] Test '{test['code']}': {e}")
print(f"[ERROR] Test '{test['code']}': {type(e).__name__}: {e}")

Copilot uses AI. Check for mistakes.
Comment on lines +91 to +96
for test in tests:
expectations["stdout"].extend(test["expect"]["stdout"])
expectations["stderr"].extend(test["expect"]["stderr"])
expectations["stdout-empty"] = expectations["stdout-empty"] or test["expect"]["stdout-empty"]
expectations["stderr-empty"] = expectations["stderr-empty"] or test["expect"]["stderr-empty"]

Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OR logic for combining empty expectations is incorrect. On lines 94-95, the code uses OR to combine stdout-empty and stderr-empty flags. This means if any single expectation in the file specifies empty output, it will override all other expectations. This is likely a bug - either these flags should use AND logic (all expectations must agree), or the parsing should reject files that mix empty and non-empty expectations for the same stream.

Suggested change
for test in tests:
expectations["stdout"].extend(test["expect"]["stdout"])
expectations["stderr"].extend(test["expect"]["stderr"])
expectations["stdout-empty"] = expectations["stdout-empty"] or test["expect"]["stdout-empty"]
expectations["stderr-empty"] = expectations["stderr-empty"] or test["expect"]["stderr-empty"]
# Track aggregate emptiness: start assuming streams can be empty,
# then clear the flags if any test expects non-empty output or
# does not mark the stream as empty.
stdout_empty_all = True
stderr_empty_all = True
for test in tests:
# Collect all expected substrings
expectations["stdout"].extend(test["expect"]["stdout"])
expectations["stderr"].extend(test["expect"]["stderr"])
# Combine empty flags with AND semantics
stdout_empty_all = stdout_empty_all and test["expect"]["stdout-empty"]
stderr_empty_all = stderr_empty_all and test["expect"]["stderr-empty"]
# If any explicit non-empty expectation exists for a stream,
# that stream cannot be globally empty.
if test["expect"]["stdout"]:
stdout_empty_all = False
if test["expect"]["stderr"]:
stderr_empty_all = False
# Finalize empty expectations only if all tests agree and there are
# no conflicting non-empty expectations.
if tests:
expectations["stdout-empty"] = stdout_empty_all
expectations["stderr-empty"] = stderr_empty_all

Copilot uses AI. Check for mistakes.
Comment on lines +7 to +19
def read_until_prompt(proc, timeout=TIMEOUT):
"""Reads output until prompt is found or timeout occurs."""
output = ""
start_time = time.time()

while time.time() - start_time < timeout:
try:
c = proc.stdout.read(1)
if not c: # EOF
break
output += c
if output.endswith(PROMPT):
return output
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading output character-by-character is inefficient and may cause performance issues. The function reads one character at a time in a tight loop, which is very slow for larger outputs. Consider reading larger chunks (e.g., using readline() or read() with a buffer size) and then searching for the prompt in the accumulated output.

Copilot uses AI. Check for mistakes.
Comment on lines +58 to +59
except Exception:
proc.kill()
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another bare 'except Exception' that silently handles all errors. The exception handler on lines 58-59 during process termination catches all exceptions without logging or distinguishing between different error types. This could hide important cleanup failures.

Suggested change
except Exception:
proc.kill()
except subprocess.TimeoutExpired:
# If the process does not exit in time, force kill it.
proc.kill()
except Exception as e:
# Log unexpected termination errors, then ensure the process is killed.
print(f"[WARN] Failed to terminate process cleanly: {e}")
proc.kill()

Copilot uses AI. Check for mistakes.
output += c
if output.endswith(PROMPT):
return output
except Exception:
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bare 'except Exception' catches all exceptions and may hide important errors. The catch-all exception handler on line 20 could suppress important errors like KeyboardInterrupt or other critical issues. Consider catching more specific exceptions (e.g., OSError, IOError) or at least re-raising certain critical exceptions.

Suggested change
except Exception:
except OSError:

Copilot uses AI. Check for mistakes.
Comment on lines +16 to +27
cmd = f"{binary} {flags} {file}"
expectations = parse_expectations(file)

try:
start_time = time.time()
proc = subprocess.run(
cmd,
shell=True,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
text=True,
timeout=timeout,
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using 'shell=True' in subprocess.run is a security risk. Passing shell=True allows shell injection attacks if any of the parameters (binary, flags, or file) contain user-controlled input. Consider using a list of arguments instead of shell=True, for example: subprocess.run([binary] + shlex.split(flags) + [file], ...).

Copilot uses AI. Check for mistakes.

with_flags("", simple_examples)

simple_run_tests("test/stdlib/*.fram",flags ='')
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent spacing around the '=' operator. Line 1 has 'flags = ' with spaces, while line 14 has 'flags =' with space before but not after. This should be consistent throughout the file, using spaces on both sides of the equals sign.

Copilot uses AI. Check for mistakes.
Comment on lines +74 to +82
try:
tests = parse_tests(file)
read_until_prompt(proc, timeout=timeout) # consume initial prompt

for test in tests:
if not _run_single_test(proc, test, timeout):
return False

return True
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty test files will pass silently. If 'parse_tests' returns an empty list (because the file has no code lines), the for loop on line 78 will not execute, and the function will return True, making the test pass. This could hide misconfigured or incorrectly formatted test files. Consider adding a check to ensure at least one test was found, or documenting that empty test files are valid.

Copilot uses AI. Check for mistakes.
test/suite Outdated
Comment on lines 25 to 29
print("\n--- REPL TESTS ---\n")
repl_tests("test/repl/*.fram", flags='')
print("\n--- REPL TESTS DONE ---\n")
repl_tests("test/ok/*.fram", flags='')
print("\n--- REPL TESTS DONE ---\n")
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug print statements are left in the production code. These print statements outputting "--- REPL TESTS ---" and "--- REPL TESTS DONE ---" should either be removed or made optional through a verbose flag, as they clutter the test output and are not consistent with the existing test infrastructure.

Suggested change
print("\n--- REPL TESTS ---\n")
repl_tests("test/repl/*.fram", flags='')
print("\n--- REPL TESTS DONE ---\n")
repl_tests("test/ok/*.fram", flags='')
print("\n--- REPL TESTS DONE ---\n")
repl_tests("test/repl/*.fram", flags='')
repl_tests("test/ok/*.fram", flags='')

Copilot uses AI. Check for mistakes.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@ppolesiuk ppolesiuk mentioned this pull request Feb 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant