http://java-source.net/open-source/html-parsers
http://mkseo.pe.kr/blog/?p=2316
http://nekohtml.sourceforge.net/
http://jtidy.sourceforge.net/