Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support more easily running/managing schema project tests #17

Open
stevedlawrence opened this issue Jan 24, 2024 · 4 comments
Open

Support more easily running/managing schema project tests #17

stevedlawrence opened this issue Jan 24, 2024 · 4 comments

Comments

@stevedlawrence
Copy link
Member

The convention for testing schema projects is to write all tests in a TDML file and then create a Scala file that used JUnit to run those tests. This leads to a bit of duplication and verbosity. It would be nice if we could reduce this to make running tests easier. Some possibilities discussed:

  1. An idea might be to automatically generate this kind of file in a test-managed directory during sbt compile (or sbt Test/compile or sbt test). Then we still can open this kind of Scala file in our IDEs to debug individual tests within the debugger, but our responsibility for creating and updating these files is handed over to the sbt plugin.

  2. Do something like this:

    @Test def test_test01(): Unit = doTest(runner)

    And the doTest function looks at the stack to figure out the function that called it, strips off the "test_" prefix and calls runner.runOneTest passing in the name. It still requires managing scala file, but it's a bit less verbose.

  3. I wonder if now that IDE's have much better SBT support, if creating an SBT Test Interface for TDML would just magically give IDE's the capabilities we expect out of IDEs (e.g. list of tests, ability to right-click a location in a TDML file to run a test).

@stevedlawrence
Copy link
Member Author

I looked at sbt's test-interface to see about what capabilities it provides for custom test infrastructures and see if option 3 is a possibility. The idea is that we would no longer use JUnit and instead us a custom test infrastructure specific for TDML files. The scala test API is actually pretty small, but that also means it's pretty limited.

First, it requires that test suites are defined in code, so there's no way to say something like "treat each .tdml file as a test suite". Instead, it only supports classes that either extend a super class or are annotated. But you could imagine having a one or a few scala files ontaining all test suits like this:

import org.apache.daffodil.tdml.TDMLSuite

class TestSuiteA extends TDMLSuite("/org/example/a.tdml")
class TestSuiteB extends TDMLSuite("/org/example/b.tdml")
class TestSuiteB extends TDMLSuite("/org/example/c.tdml")
...

This way you wouldn't need to define all the tests, you just associate each suite with a tdml file. Much less verbose with less boilerplate.

When running tests via sbt, we would tell sbt how to discover these classes (i.e. anything that extends TDMLSuite) and it would find and the names to a custom test runner class that we implement. This test runner would then create an instance of the class name, read the associated tdml file, and execute the tests, passing back information to SBT about the name of each test we ran in the suite and the result. We would need some way for a TDML file to say "this test is broken, skip it" (right now we just comment out tests), but other than that it would pretty drastically reduce boilerplate.

The major downside with this approach is that no IDE's would support it. I was hoping the SBT test API would allow a way to pass information back about the tests so IDE's could list them, know which file/line they are at to enable right-click in a .tdml file to execute, etc. but it just doesn't provide that information. So this approach would also need custom plugins for any IDE's we also want to support, which is probably a deal breaker.

So I think idea 3 is out.

@mbeckerle
Copy link

I depend on having a line of scala code that I can click on and say "debug", with breakpoints set in layers, UDFs or other code artifacts of the schema.

Using Li Haoyi's sourcecode library I was able to create this kind of test line:

@Test def test_test01(): Unit = go

The go function uses the sourcecode library which reflects on the method name, "test_" if present is stripped off leaving just the "test01" which is then used as a TDML test case name.

I also really like doing

@Test def test_test01(): Unit = runner.trace.runOneTest("test01")

That turns on the trace mode of the debugger. I find most schema bugs this way.

Both techniques could be combined to eliminate the redundancy of test case names, as well as providing the trace behavior.

Last point: We frequently find a bug, create TDML tests for it, then do:

// DAFFODIL-XYZY
// @Test def test_test01(): Unit = runner.runOneTest("test01")

so that we can commit and merge a test that illustrates the bug, but doesn't run, so it doesn't break the build.
I really like this split where you can add a test independently of creating, or even designing or thinking about, the fix.

This could be achieved by some technique of editing the TDML file however. Worst case commenting the whole test out and putting the JIRA ticket ID into the comment.

I do think there is some divergence of TDML files and the scala driver files at this point. Many tests appear to be present in the TDML which are never exercised by the scala driver files. This can be confusing. Tests in TDML with associated schemas can seem to contradict the DFDL spec, and only subsequently do you find out those TDML tests are not being used any more.

Actual suggestion/idea:

What if the scala driver files are generated from the TDML when needed? Like sbt testGen creates the test driver in test/src_managed, but this is manually invoked by developers who want the drivers so they can easily trace or debug just one test.

@jw3
Copy link
Member

jw3 commented Jan 24, 2024

I did a thing a while back, could help as a reference impl: https://github.com/jw3/sbt-tdml

Mailing list discussion: https://lists.apache.org/thread/r0fnzvsk10cqpnkjbr1q528z7kz0ngpc

@stevedlawrence
Copy link
Member Author

I'm not suggesting we drop the current style, but I think a lot of Mike's concerns could theoretically be resolved, and I think would make things better:

I depend on having a line of scala code that I can click on and say "debug", with breakpoints set in layers, UDFs or other code artifacts of the schema.

I think it would actually be much more convenient be able to right-click a parserTestCase in a TDML file and say "run" or "debug", and behind the scenes it calls runOneTest. I'm not sure if an IDE plugin could support that though or if they are more code-centric like SBT. But if it could, all the schemas/inputs/outputs/etc are right there at what you're already looking at. One of the things I dislike about the scala file separation is having to find the runner and tdml file associated with a failing scala test--it's a multi-step process.

I also really like doing

@Test def test_test01(): Unit = runner.trace.runOneTest("test01")

This could be supported by adding something like a trace="true" attribute to parserTestCase element, which could cause the test interface to enable tracing before running the test.

Last point: We frequently find a bug, create TDML tests for it, then do:

// DAFFODIL-XYZY
// @test def test_test01(): Unit = runner.runOneTest("test01")

...

This could be achieved by some technique of editing the TDML file however. Worst case commenting the whole test out and putting the JIRA ticket ID into the comment.

I think editing the TDML actually has benefits. For example, you could have an attribute like broken="DAFFODIL-123". This means that if a test is broken it now requires there be a referenced bug, and the existence of the attribute would tell the test-interface to skip it. It would also allow you to do something like this:

sbt testOnly -- --broken

To run only the broken tests to see if any tests have been fixed without realizing it.

Getting rid of scala files also avoid issues where tests are not run because we forgot to add a @Test for them, or we run the same test multiple times because we copy pasted an existing test without changing the names. I do this frequently.

But as I said before, this all requires a unique test-interface plugin for each thing we support, which is at least SBT and one or more IDES, which is probably too much effort.

I did a thing a while back, could help as a reference impl: https://github.com/jw3/sbt-tdml

This is very cool, I forgot about this. This could be very useful, and the test-interface code is very small and maintainable. Though, I image we really just want the TDML generation aspect, and have it generate our current, verbose format with all the @Tests.

Another potential issue with the test-interface approach, which looking at sbt-tdml reminded me of, is that SBT doesn't separate plugin/test-inteface dependencies from normal dependencies. So for example, if a DFDL schema project depends on one version of daffodil-tdml, and the sbt-tdml test-interface depends on another, we can run into problems. I wonder if we could resolve this by saying that the sbt-tdml test interface has a "provided" dependency to daffodil-tdml-processor. Then sbt-daffodil adds whatever version the user specifies in daffodilVersion and the test-interface will just use that. And as long as the daffodil-tdml API stays stable (which it pretty much is), this would work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants