For anyone who is using Elm 0.19 and blocked by this library, consider using hecrj/html-parser for now. It seems still WIP but passed the same test cases. I think that means the most difficult part of the HTML spec should be already covered (e.g. <li>
does not need a closing tag). Also, it uses the official parser elm/parser
which I was planning to use in the newer version. So I think contributing to hecrj/html-parser
is the fastest path for everyone getting happy :)
Parse HTML in Elm! (DEMO)
import HtmlParser as HtmlParser exposing (..)
parse "text" == [ Text "text" ]
parse "<h1>Hello<br>World</h1>"
== [ Element "h1" [] [ Text "Hello", Element "br" [] [], Text "World" ] ]
parse """<a href="http://example.com">Example</a>"""
== [ Element "a" [("href", "http://example.com")] [ Text "Example" ] ]
import HtmlParser exposing (..)
import HtmlParser.Util exposing (..)
table = """
<table border=0 cellpadding=0 cellspacing=0 width=216 style='border-collapse:
collapse;width:162pt'>
<!--StartFragment-->
<col width=72 span=3 style='width:54pt'>
<tr height=18 style='height:13.5pt'>
<td height=18 align=right width=72 style='height:13.5pt;width:54pt'>1</td>
<td align=right width=72 style='width:54pt'>2</td>
<td align=right width=72 style='width:54pt'>3</td>
</tr>
<tr height=18 style='height:13.5pt'>
<td height=18 class=xl69 align=right style='height:13.5pt'>2</td>
<td class=xl66 align=right>3</td>
<td align=right>4</td>
</tr>
<!--EndFragment-->
</table>
"""
( parse table
|> getElementsByTagName "tr"
|> mapElements
(\_ _ innerTr ->
innerTr
|> mapElements (\_ _ innerTd -> textContent innerTd)
|> String.join "\t"
|> String.trim
)
|> String.join "\n"
) == "1\t2\t3\n2\t3\t4"
BSD3