There are three separate processes for generating and validating UCDXML files and their corresponding UAX42 report.
-
Generate the UCDXML files.
-
(Optional) You can compare the generated UCDXML files against each other (e.g., Flat vs Grouped) or against previous versions.
-
Generate UAX42. There are three steps involved:
- Generate the property value fragments. The updated versions should live in unicodetools/src/main/resources/org/unicode/uax42/fragments
- Generate the index.html and index.rnc files for UAX42.
- (Optional) Validate the UCDXML files using index.rnc.
- You can generate flat or grouped versions of UCDXML.
- You can generate UCDXML files for:
- the full range of code points
- the Unihan code points
- code points that are not Unihan code points
mvn compile exec:java '-Dexec.mainClass="org.unicode.xml.UCDXML"' '-Dexec.args="--range ALL --output FLAT"' -DCLDR_DIR=$(cd ../cldr; pwd) -DUNICODETOOLS_GEN_DIR=$(cd ../Generated; pwd) -DUNICODETOOLS_REPO_DIR=$(pwd)
mvn compile exec:java '-Dexec.mainClass="org.unicode.xml.UCDXML"' '-Dexec.args="--range UNIHAN --output FLAT"' -DCLDR_DIR=$(cd ../cldr; pwd) -DUNICODETOOLS_GEN_DIR=$(cd ../Generated; pwd) -DUNICODETOOLS_REPO_DIR=$(pwd)
mvn compile exec:java '-Dexec.mainClass="org.unicode.xml.UCDXML"' '-Dexec.args="--range NOUNIHAN --output FLAT"' -DCLDR_DIR=$(cd ../cldr; pwd) -DUNICODETOOLS_GEN_DIR=$(cd ../Generated; pwd) -DUNICODETOOLS_REPO_DIR=$(pwd)
mvn compile exec:java '-Dexec.mainClass="org.unicode.xml.UCDXML"' '-Dexec.args="--range ALL --output GROUPED"' -DCLDR_DIR=$(cd ../cldr; pwd) -DUNICODETOOLS_GEN_DIR=$(cd ../Generated; pwd) -DUNICODETOOLS_REPO_DIR=$(pwd)
mvn compile exec:java '-Dexec.mainClass="org.unicode.xml.UCDXML"' '-Dexec.args="--range UNIHAN --output GROUPED"' -DCLDR_DIR=$(cd ../cldr; pwd) -DUNICODETOOLS_GEN_DIR=$(cd ../Generated; pwd) -DUNICODETOOLS_REPO_DIR=$(pwd)
mvn compile exec:java '-Dexec.mainClass="org.unicode.xml.UCDXML"' '-Dexec.args="--range NOUNIHAN --output GROUPED"' -DCLDR_DIR=$(cd ../cldr; pwd) -DUNICODETOOLS_GEN_DIR=$(cd ../Generated; pwd) -DUNICODETOOLS_REPO_DIR=$(pwd)
After generating UCDXML files, you can compare:
- Different versions of the same type (range and output) of UCDXML file
- Grouped and flat versions of the same code point range
mvn compile exec:java '-Dexec.mainClass="org.unicode.xml.CompareUCDXML"' '-Dexec.args="-a {path to file} -b {path to file}"'
mvn compile exec:java '-Dexec.mainClass="org.unicode.xml.GeneratePropertyValues"' -DCLDR_DIR=$(cd ../cldr ; pwd) -DUNICODETOOLS_GEN_DIR=$(cd ../Generated ; pwd) -DUNICODETOOLS_REPO_DIR=$(pwd)
UAX42 fragments live in unicodetools/src/main/resources/org/unicode/uax42/fragments
mvn xml:transform -f $(cd ./unicodetools/src/main/resources/org/unicode/uax42; pwd) -Doutputdir=$(cd ../Generated/uax42; pwd)
You'll need a RELAX NG schema validator. We'll use jing-trang in this example.
- Clone and build jing-trang
- Run the following:
Note that the UAX xml file has to be saved as NFD as the Unihan syntax regular expressions are expecting NFD.
java -jar C:\_git\jing-trang\build\jing.jar -c UNICODETOOLS_REPO_DIR\uax\uax42\output\index.rnc <path to UAX xml file>