Skip to content Skip to sidebar Skip to footer

How To Get Orphaned Text With Jsoup?

I have an html: This is the first text More text here Another line of text Text in the spanAnother text in span

Solution 1:

I would go with a recursive method that takes your starting tag and iterates over its child nodes. For each TextNode, print the contents. For each Element, check it for child nodes.

publicstaticvoidmain(String[] args)throws ParseException, IOException
{
    //I put your HTML in the body tag in a local fileDocumentdoc= Jsoup.parse(newFile("input/20160505.html"), "UTF-8");
    Elementselements= doc.getElementsByTag("body");
    ElementrootTag= elements.get(0);
    printTextOfTag(rootTag);
}

publicstaticvoidprintTextOfTag(Element currentTag)
{
    List<Node> nodes = currentTag.childNodes();
    for(Node n : nodes)
    {
        if(n instanceof TextNode)
        {
            System.out.println(((TextNode)n).text());
        }
        elseif(n instanceof Element)
        {
            printTextOfTag((Element)n);
        }
    }
}

Output

This is the first text

 More text here Another line of text 

Text in the span



Another text in span

 This is another line

Post a Comment for "How To Get Orphaned Text With Jsoup?"