mtvalidate-0.5/0000755000076500000240000000000010552205722013041 5ustar distlerstaffmtvalidate-0.5/README0000644000076500000240000000776710552205722013742 0ustar distlerstaffMTValidate Plugin ================= Current Version: 0.5 (1/10/2007) Introduction ------------ This plugin is a wrapper around the code that powers the W3C Validator. You can use it to ensure that anything that might appear on your blog (entries, comments) is valid (X)HTML. Originally authored by Alexei Kosut, I seem to be maintaining it now. The plugin provides 3 MT tags This is a container tag, the contents of which will be passed to the Validator. For example Validating Your Comment

Your Comment

<$MTCommentPreviewBody$>

Submitted by <$MTCommentPreviewAuthorLink$> at <$MTCommentPreviewDate$>

in the Comment Preview Template will validate the user's comment, and return a message saying either that the comment was valid or listing the errors it contained. Note that the content of this tag must expand to a fully-formed (X)HTML document, including a DTD. A conditional tag, which will display its content if the preceeding processed a valid document. A conditional tag, which will display its content if the preceeding processed an invalid document. Examples of the usage of this plugin for validating comments can be found at http://golem.ph.utexas.edu/~distler/blog/archives/000155.html and http://golem.ph.utexas.edu/~distler/blog/archives/000383.html Installation ------------ This may be the trickiest part. There are a number of prerequisites. If you are on a good webhost, everything you need will be already there. If not... 1. First, you need the onsgmls SGML parser (part of the OpenSP http://openjade.sourceforge.net/ distribution). This is standard on Linux, where it's located at /usr/bin/onsgmls . On MacOSX, the easiest way to install it is using fink http://fink.sf.net/ fink install opensp 2. Next, you'll need a bunch of Perl modules. Most of these are standard. Some, marked with an asterisk, are required for MovableType (and so you almost certainly have them already) Config::General (v. 2.06 or later) File::Spec* File::Temp* HTML::Entities* HTML::Parser (v. 3.25 or later) HTML::Template (v. 2.6 or later)* Set::IntSpan Text::Iconv Text::Wrap XML::LibXML (optional: needed if you set XHTML_Check = 1 ) The whole shebang (and a few superfluous modules) can be installed via CPAN http://cpan.org/ using install Bundle::W3C::Validator More detail about installing via CPAN (especially, if you don't have root access) can be found in this old blog entry http://golem.ph.utexas.edu/~distler/blog/archives/000155.html 3. Download, uncompress and untar the plugin http://golem.ph.utexas.edu/~distler//blog/files/mvalidate.tar.gz (Gee, I guess you just did that.) You will find, in addition to this README file, a "validator" folder that you should place inside the "plugins" folder of your MovableType installation. 4. Download, uncompress and untar the "sgml-lib" folder http://validator.w3.org/sgml-lib.tar.gz and place it inside the "validator" folder. 5. Open up the file "validator/config/validator.conf" and make sure the location of "onsgmls" is set correctly. Also, decide whether you want to enable additional checking of XHTML files (see http://golem.ph.utexas.edu/~distler/blog/archives/001054.html for details). If so, set XHTML_Check = 1 6. Now copy the "validator" folder to your MovableType "plugins" folder and enjoy ... mtvalidate-0.5/validator/0000755000076500000240000000000010552013653015026 5ustar distlerstaffmtvalidate-0.5/validator/config/0000755000076500000240000000000010551776470016307 5ustar distlerstaffmtvalidate-0.5/validator/config/charset.cfg0000644000076500000240000000544310550343726020420 0ustar distlerstaff# # Mapping of 'charset' or 'encoding' parameter to conversion parameter # # $Id: charset.cfg,v 1.1 2003/01/27 18:05:12 akosut Exp $ # # this version for glibc iconv 2.1; change for other versions # # Syntax: # # charset/encoding = ? result # # Note: charsets and results are lowercase, actions are uppercase # # ? indicates the action to take: # I iconv: use result as input to iconv # Note: use this also if iconv takes charset parameter directly # X: frequent error, e.g. starting with x-; ask user to replace with result utf-16 = I utf-16 utf-16be = I utf-16be utf-16le = I utf-16le iso-8859-1 = I iso-8859-1 iso-8859-2 = I iso-8859-2 iso-8859-3 = iso-8859-3 iso-8859-4 = iso-8859-4 iso-8859-5 = I iso-8859-5 iso-8859-6 = I iso-8859-6 # implicit bidi, but character encoding is the same iso-8859-6-i = I iso-8859-6 iso-8859-7 = I iso-8859-7 iso-8859-8 = I iso-8859-8 # implicit bidi, but character encoding is the same iso-8859-8-i = I iso-8859-8 iso-8859-9 = I iso-8859-9 iso-8859-10 = I iso-8859-10 # iso-8859-11/12 don't exist yet iso-8859-13 = I iso-8859-13 iso-8859-14 = I iso-8859-14 iso-8859-15 = I iso-8859-15 us-ascii = I us-ascii iso-2022-jp = I iso-2022-jp shift_jis = I shift_jis euc-jp = I euc-jp gb2312 = I gb2312 big5 = I big5 iso-2022-kr = I iso-2022-kr euc-kr = I euc-kr gb18030 = I gb18030 tis-620 = I tis-620 koi8-r = I koi8-r koi8-u = I koi8-u windows-1250 = I cp1250 windows-1251 = I cp1251 windows-1252 = I cp1252 windows-1253 = I cp1253 windows-1254 = I cp1254 windows-1255 = I cp1255 windows-1256 = I cp1256 windows-1257 = I cp1257 # windows-1258 = I cp1258 # wait until normalization checked macintosh = I macintosh x-mac-roman = X macintosh x-sjis = X shift_jis iso8859-1 = X iso-8859-1 ascii = X us-ascii 8859_1 = X iso-8859-1 # this one is in IANA, but better use only windows-1252 iso-8859-1-Windows-3.1-Latin-1 = X windows-1252 mtvalidate-0.5/validator/config/doctypes.cfg0000644000076500000240000001025110550343726020612 0ustar distlerstaff# # Mapping of HTML Version "names" to DOCTYPEs. Used for DOCTYPE override. # # $Id: doctypes.cfg,v 1.1 2003/01/27 18:05:12 akosut Exp $ HTML 0.0 = \ Strict HTML 0.0 = \ HTML 1.0 = \ Strict HTML 1.0 = \ Strict HTML 2.0 = \ HTML 2.0 = \ HTML 2.1E = \ HTML 3.0 (AdvaSoft version) = \ HTML 3.0 (Beta) = \ Strict HTML 3.0 (Beta) = \ Hotjava-HTML = \ Strict Hotjava-HTML = \ Netscape-HTML = \ Strict Netscape-HTML = \ MSIE-HTML = \ Strict MSIE-HTML = \ MSIE 3.0 HTML = \ Strict MSIE 3.0 HTML = \ ORA HTML Extended v1.0 = \ ORA HTML Extended Relaxed v1.0 = \ Apple Help 1.0 = \ HTML 2.2 = \ HTML 1996-01 = \ HTML 3.2 = \ HTML 3.2 + Style = \ HTML Pro = \ Spyglass HTML 2.0 Extended = \ HTML Level Cougar = \ HTML 4.0 Strict = \ HTML 4.0 Transitional = \ HTML 4.0 Frameset = \ HTML 4.01 Strict = \ HTML 4.01 Transitional = \ HTML 4.01 Frameset = \ XHTML 1.0 Strict = \ XHTML 1.0 Transitional = \ XHTML 1.0 Frameset = \ XHTML Basic 1.0 = \ XHTML 1.1 = \ SVG 1.0 = \ SMIL 1.0 = \ SMIL 2.0 = \ mtvalidate-0.5/validator/config/eref.cfg0000644000076500000240000000636210550343726017711 0ustar distlerstaff# # Mapping of element names to an URI fragment for their definition. # # $Id: eref.cfg,v 1.1 2003/01/27 18:05:12 akosut Exp $ a = special/a.html abbr = phrase/abbr.html acronym = phrase/acronym.html address = block/address.html applet = special/applet.html area = special/area.html b = fontstyle/b.html base = head/base.html basefont = special/basefont.html bdo = special/bdo.html big = fontstyle/big.html blockquote = block/blockquote.html body = html/body.html br = special/br.html button = forms/button.html caption = tables/caption.html center = block/center.html cite = phrase/cite.html code = phrase/code.html col = tables/col.html colgroup = tables/colgroup.html dd = lists/dd.html del = phrase/del.html dfn = phrase/dfn.html dir = lists/dir.html div = block/div.html dl = lists/dl.html dt = lists/dt.html em = phrase/em.html fieldset = forms/fieldset.html font = special/font.html form = forms/form.html frame = frames/frame.html frameset = frames/frameset.html h1 = block/h1.html h2 = block/h2.html h3 = block/h3.html h4 = block/h4.html h5 = block/h5.html h6 = block/h6.html head = head/head.html hr = block/hr.html html = html/html.html i = fontstyle/i.html iframe = special/iframe.html img = special/img.html input = forms/input.html ins = phrase/ins.html isindex = block/isindex.html kbd = phrase/kbd.html label = forms/label.html legend = forms/legend.html li = lists/li.html link = head/link.html map = special/map.html menu = lists/menu.html meta = head/meta.html noframes = frames/noframes.html noscript = block/noscript.html object = special/object.html ol = lists/ol.html optgroup = forms/optgroup.html option = forms/option.html p = block/p.html param = special/param.html pre = block/pre.html q = special/q.html s = fontstyle/s.html samp = phrase/samp.html script = special/script.html select = forms/select.html small = fontstyle/small.html span = special/span.html strike = fontstyle/strike.html strong = phrase/strong.html style = head/style.html sub = special/sub.html sup = special/sup.html table = tables/table.html tbody = tables/tbody.html td = tables/td.html textarea = forms/textarea.html tfoot = tables/tfoot.html th = tables/th.html thead = tables/thead.html title = head/title.html tr = tables/tr.html tt = fontstyle/tt.html u = fontstyle/u.html ul = lists/ul.html var = phrase/var.html mtvalidate-0.5/validator/config/fpis.cfg0000644000076500000240000001057510550343726017732 0ustar distlerstaff# # Mapping of FPIs to plain text version strings. # # $Id: fpis.cfg,v 1.1 2003/01/27 18:05:12 akosut Exp $ -//IETF//DTD HTML Level 0//EN//2.0 = \ HTML 2.0 Level 0 -//IETF//DTD HTML Strict Level 0//EN//2.0 = \ Strict HTML 2.0 Level 0 -//IETF//DTD HTML 2.0 Level 1//EN = \ HTML 2.0 Level 1 -//IETF//DTD HTML 2.0 Strict Level 1//EN = \ Strict HTML 2.0 Level 1 -//IETF//DTD HTML 2.0 Strict//EN = \ Strict HTML 2.0 -//IETF//DTD HTML 2.0//EN = \ HTML 2.0 -//IETF//DTD HTML 2.1E//EN = \ HTML 2.1E -//AS//DTD HTML 3.0 asWedit + extensions//EN = \ HTML 3.0 (AdvaSoft version) -//IETF//DTD HTML 3.0//EN = \ HTML 3.0 (Beta) -//W3O//DTD W3 HTML Strict 3.0//EN// = \ Strict HTML 3.0 (Beta) -//Sun Microsystems Corp.//DTD HotJava HTML//EN = \ Hotjava-HTML -//Sun Microsystems Corp.//DTD HotJava Strict HTML//EN = \ Strict Hotjava-HTML -//WebTechs//DTD Mozilla HTML 2.0//EN = \ Netscape-HTML -//Netscape Comm. Corp. Strict//DTD HTML//EN = \ Strict Netscape-HTML -//Microsoft//DTD Internet Explorer 2.0 HTML//EN = \ MSIE-HTML -//Microsoft//DTD Internet Explorer 2.0 HTML Strict//EN = \ Strict MSIE-HTML -//Microsoft//DTD Internet Explorer 3.0 HTML//EN = \ MSIE 3.0 HTML -//Microsoft//DTD Internet Explorer 3.0 HTML Strict//EN Strict MSIE 3.0 HTML -//OReilly and Associates//DTD HTML Extended 1.0//EN = \ O'Reilly HTML Extended v1.0 -//OReilly and Associates//DTD HTML Extended Relaxed 1.0//EN = \ O'Reilly HTML Extended Relaxed v1.0 -//bebop.net//DTD HTML Apple Help 1.0//EN = \ Apple Help 1.0 -//IETF//DTD HTML V2.2//EN = \ HTML 2.2 -//W3C//DTD HTML 1996-01//EN = \ HTML 1996-01 -//W3C//DTD HTML 3.2 Final//EN = \ HTML 3.2 -//W3C//DTD HTML Experimental 970421//EN = \ HTML 3.2 + Style +//Silmaril//DTD HTML Pro v0r11 19970101//EN = \ HTML Pro -//Spyglass//DTD HTML 2.0 Extended//EN = \ Spyglass HTML 2.0 Extended http://www.w3.org/MarkUp/Cougar/Cougar.dtd = \ HTML Level Cougar" -//W3C//DTD HTML 4.0//EN = \ HTML 4.0 Strict -//W3C//DTD HTML 4.0 Transitional//EN = \ HTML 4.0 \ Transitional -//W3C//DTD HTML 4.0 Frameset//EN = \ HTML 4.0 \ Frameset -//W3C//DTD HTML 4.01//EN = \ HTML 4.01 \ Strict -//W3C//DTD HTML 4.01 Transitional//EN = \ HTML 4.01 \ Transitional -//W3C//DTD HTML 4.01 Frameset//EN = \ HTML 4.01 \ Frameset -//W3C//DTD XHTML 1.0 Strict//EN = \ XHTML 1.0 Strict -//W3C//DTD XHTML 1.0 Transitional//EN = \ XHTML 1.0 \ Transitional -//W3C//DTD XHTML 1.0 Frameset//EN = \ XHTML 1.0 \ Frameset XML = \ XML ISO/IEC 15445:2000//DTD HyperText Markup Language//EN = \ ISO/IEC \ 15445:2000 (ISO-HTML) ISO/IEC 15445:2000//DTD HTML//EN = \ ISO/IEC \ 15445:2000 (ISO-HTML) -//W3C//DTD MathML 2.0//EN = \ MathML 2.0 -//W3C//DTD XHTML 1.1 plus MathML 2.0//EN = \ XHTML 1.1 plus MathML 2.0 -//W3C//DTD XHTML Basic 1.0//EN = \ XHTML Basic \ 1.0 -//W3C//DTD XHTML 1.1//EN = \ XHTML 1.1 -//W3C//DTD SVG 1.0//EN = \ SVG 1.0 -//W3C//DTD SVG 20010719//EN = \ SVG 1.0 PR 20010719 -//W3C//DTD SMIL 1.0//EN = \ SMIL 1.0 -//W3C//DTD SMIL 2.0//EN = \ SMIL 2.0 mtvalidate-0.5/validator/config/frag.cfg0000644000076500000240000000723310550343726017705 0ustar distlerstaff# # Mapping of error message to URI fragment for the explanations. # # $Id: frag.cfg,v 1.1 2003/01/27 18:05:12 akosut Exp $ # # Original SP version. # entity end not allowed in comment = \ unterm-comment-1 name start character invalid only s and comment allowed in comment \ declaration = \ unterm-comment-2 name character invalid only s and comment allowed in comment declaration = \ unterm-comment-2 unknown declaration type FOO = \ bad-comment character FOO not allowed in attribute specification list = \ attr-char an attribute value must be a literal unless it contains only name characters =\ attr-quoted syntax of attribute value does not conform to declared value = \ bad-attr-char length of attribute value must not exceed LITLEN less NORMSEP = \ name-length element FOO undefined = \ undef-tag element FOO not allowed here = \ not-allowed there is no attribute FOO = \ attr-undef FOO is not a member of the group specified in the declared value of this \ attribute = \ undef-attr-val FOO is not a member of a group specified for any attribute = \ bad-abbrev-attr end tag for FOO omitted but its declaration does not permit this = \ no-end-tag end tag for element FOO which is not open = \ floating-close end tag for FOO which is not finished = \ omitted-content start tag for FOO omitted but its declaration does not permit this = \ no-start-tag general entity FOO not defined and no default entity = \ bad-entity non SGML character number = \ bad-char cannot generate system identifier for entity FOO = \ bad-pub-id ID FOO already defined = \ dup-id ID FOO first defined here = \ dup-id # # Horribly verbose versions from lq-nsgmls. # document type does not allow element FOO here = \ not-allowed-contained there is no attribute FOO for this element in this HTML version = \ attr-undef an attribute value must be quoted if it contains any character other than \ letters AZaz digits hyphens and periods use quotes if in doubt = \ attr-quoted element FOO not allowed here possible cause is an inline element containing a \ blocklevel element = \ not-allowed element FOO not allowed here check which elements this element may be \ contained within = \ not-allowed missing a required subelement of FOO = \ missing-subel unknown entity FOO = \ bad-entity end tag for FOO omitted possible causes include a missing end tag improper \ nesting of elements or use of an element where it is not allowed = \ no-end-tag start tag was here = \ start-tag end tag for element FOO which is not open try removing the end tag or check \ for improper nesting of elements = \ floating-close element FOO not defined in this HTML version = \ undef-tag required attribute FOO not specified = \ attr-missing text is not allowed here try wrapping the text in a more descriptive \ container = \ text-not-allowed value of attribute FOO cannot be FOO must be one of FOO = \ unkn-att-val character FOO not allowed in attribute specification list possibly caused by \ a missing quotation mark ending a previous attribute value = \ no-attr-end reference to nonSGML character = \ bad-char duplicate specification of attribute FOO = \ dup-attr an attribute specification must start with a name or name token = \ attr-spec-nmtoken invalid comment declaration check your comment syntax = \ inval-comment comment declaration started here = \ inval-comment element FOO not allowed here assuming missing FOO starttag = \ assuming-missing-starttag # # Reported but not yet explained... FOO not finished but containing element ended # # Doesn't work for some reason..? invalid attribute value = \ invalid-attr-val mtvalidate-0.5/validator/config/type.cfg0000644000076500000240000000057010550343726017744 0ustar distlerstaff# # Mapping of Content-Type to document type. # # $Id: type.cfg,v 1.1 2003/01/27 18:05:12 akosut Exp $ text/xml = xml+xml image/svg = svg+xml image/svg+xml = svg+xml application/smil = smil+xml application/xml = xml+xml text/html = html text/vnd.wap.wml = xml+xml application/xhtml+xml = html+xml mtvalidate-0.5/validator/config/types.conf0000644000076500000240000002612010550344152020306 0ustar distlerstaff# # Main Document Type Database for the W3C MarkUp Validation Service. # # $Id: types.conf,v 1.1 2003/01/27 18:05:12 akosut Exp $ # # Maintains all information for each of the document types we support. # See 'perldoc Config::General' for the syntax, and be aware that the # 'SplitPolicy' is 'equalsign', ie. keys and values are separated by '\s*=\s*'. # # The meaning of the parameters are mostly obvious and all are documented # in docs/devel.html. Of particular note, the names of each section is # arbitrary and the "datastructure" is turned inside out in the code so # that it is indexed by the PubID. This means you can not have multiple # entries with identical PubID! # # # The five different ways to refer to HTML 2.0: # "HTML 2.0", "HTML 2.0 Level 2", "HTML 2.0 Level 1", # "HTML 2.0 Strict", "HTML 2.0 Strict Level 1". Name = html Display = HTML 2.0 Info_URL = http://www.w3.org/MarkUp/html-spec/ PubID = -//IETF//DTD HTML 2.0//EN Parse_Mode = SGML Allowed = text/html Forbidden = application/xhtml+xml Preferred = text/html URI = http://validator.w3.org/images/vh20 Name = html Display = HTML 2.0 Level 2 Info_URL = http://www.w3.org/MarkUp/html-spec/ PubID = -//IETF//DTD HTML 2.0 Level 2//EN Parse_Mode = SGML Allowed = text/html Forbidden = application/xhtml+xml Preferred = text/html URI = http://validator.w3.org/images/vh20 Name = html Display = HTML 2.0 Level 1 Info_URL = http://www.w3.org/MarkUp/html-spec/ PubID = -//IETF//DTD HTML 2.0 Level 2//EN Parse_Mode = SGML Allowed = text/html Forbidden = application/xhtml+xml Preferred = text/html URI = http://validator.w3.org/images/vh20 Name = html Display = HTML 2.0 Strict Info_URL = http://www.w3.org/MarkUp/html-spec/ PubID = -//IETF//DTD HTML 2.0 Strict//EN Parse_Mode = SGML Allowed = text/html Forbidden = application/xhtml+xml Preferred = text/html URI = http://validator.w3.org/images/vh20 Name = html Display = HTML 2.0 Strict Level 1 Info_URL = http://www.w3.org/MarkUp/html-spec/ PubID = -//IETF//DTD HTML 2.0 Strict Level 1//EN Parse_Mode = SGML Allowed = text/html Forbidden = application/xhtml+xml Preferred = text/html URI = http://validator.w3.org/images/vh20 # # HTML 3.2. Name = html Display = HTML 3.2 Info_URL = http://www.w3.org/TR/REC-html32 PubID = -//W3C//DTD HTML 3.2 Final//EN Parse_Mode = SGML Allowed = text/html Forbidden = application/xhtml+xml Preferred = text/html URI = http://www.w3.org/Icons/valid-html32 Height = 31 Width = 88 # # More "current" document types: # HTML 4.0, HTML 4.01, XHTML 1.0, XHTML 1.1, XHTML Basic. # (The three first in "Strict", "Transitional", and "Frameset" variants) Name = html Display = HTML 4.0 Strict Info_URL = http://www.w3.org/TR/1998/REC-html40-19980424/ PubID = -//W3C//DTD HTML 4.0//EN SysID = http://www.w3.org/TR/1998/REC-html40-19980424/strict.dtd Parse_Mode = SGML Allowed = text/html Forbidden = application/xhtml+xml Preferred = text/html Allowed = Required = 0 URI = http://www.w3.org/Icons/valid-html401 Height = 31 Width = 88 Name = html Display = HTML 4.0 Transitional Info_URL = http://www.w3.org/TR/1998/REC-html40-19980424/ PubID = -//W3C//DTD HTML 4.0 Transitional//EN SysID = http://www.w3.org/TR/1998/REC-html40-19980424/loose.dtd Parse_Mode = SGML Allowed = text/html Forbidden = application/xhtml+xml Preferred = text/html Allowed = Required = 0 URI = http://www.w3.org/Icons/valid-html401 Height = 31 Width = 88 Name = html Display = HTML 4.0 Frameset Info_URL = http://www.w3.org/TR/1998/REC-html40-19980424/ PubID = -//W3C//DTD HTML 4.0 Frameset//EN SysID = http://www.w3.org/TR/1998/REC-html40-19980424/frameset.dtd Parse_Mode = SGML Allowed = text/html Forbidden = application/xhtml+xml Preferred = text/html Allowed = Required = 0 URI = http://www.w3.org/Icons/valid-html401 Height = 31 Width = 88 Name = html Display = HTML 4.01 Strict Info_URL = http://www.w3.org/TR/1999/REC-html401-19991224/ PubID = -//W3C//DTD HTML 4.01//EN SysID = http://www.w3.org/TR/1999/REC-html401-19991224/strict.dtd Parse_Mode = SGML Allowed = text/html Forbidden = application/xhtml+xml Preferred = text/html Allowed = Required = 0 URI = http://www.w3.org/Icons/valid-html401 Height = 31 Width = 88 Name = html Display = HTML 4.01 Transitional Info_URL = http://www.w3.org/TR/1999/REC-html401-19991224/ PubID = -//W3C//DTD HTML 4.01 Transitional//EN SysID = http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd Parse_Mode = SGML Allowed = text/html Forbidden = application/xhtml+xml Preferred = text/html Allowed = Required = 0 URI = http://www.w3.org/Icons/valid-html401 Height = 31 Width = 88 Name = html Display = HTML 4.01 Frameset Info_URL = http://www.w3.org/TR/1999/REC-html401-19991224/ PubID = -//W3C//DTD HTML 4.01 Frameset//EN SysID = http://www.w3.org/TR/1999/REC-html401-19991224/frameset.dtd Parse_Mode = SGML Allowed = text/html Forbidden = application/xhtml+xml Preferred = text/html Allowed = Required = 0 URI = http://www.w3.org/Icons/valid-html401 Height = 31 Width = 88 Name = html Display = XHTML 1.0 Strict Info_URL = http://www.w3.org/TR/2000/REC-xhtml1-20000126/ PubID = -//W3C//DTD XHTML 1.0 Strict//EN SysID = http://www.w3.org/TR/2002/REC-xhtml1-20020801/DTD/xhtml1-strict.dtd Parse_Mode = XML Allowed = text/html Allowed = application/xhtml+xml Preferred = application/xhtml+xml Allowed = http://www.w3.org/1999/xhtml Required = 1 URI = http://www.w3.org/Icons/valid-xhtml10 Height = 31 Width = 88 Name = html Display = XHTML 1.0 Transitional Info_URL = http://www.w3.org/TR/2000/REC-xhtml1-20000126/ PubID = -//W3C//DTD XHTML 1.0 Transitional//EN SysID = http://www.w3.org/TR/2002/REC-xhtml1-20020801/DTD/xhtml1-transitional.dtd Parse_Mode = XML Allowed = text/html Allowed = application/xhtml+xml Preferred = application/xhtml+xml Allowed = http://www.w3.org/1999/xhtml Required = 1 URI = http://www.w3.org/Icons/valid-xhtml10 Height = 31 Width = 88 Name = html Display = XHTML 1.0 Frameset Info_URL = http://www.w3.org/TR/2000/REC-xhtml1-20000126/ PubID = -//W3C//DTD XHTML 1.0 Frameset//EN SysID = http://www.w3.org/TR/2002/REC-xhtml1-20020801/DTD/xhtml1-frameset.dtd Parse_Mode = XML Allowed = text/html Allowed = application/xhtml+xml Preferred = application/xhtml+xml Allowed = http://www.w3.org/1999/xhtml Required = 1 URI = http://www.w3.org/Icons/valid-xhtml10 Height = 31 Width = 88 Name = html Display = XHTML Basic 1.0 Info_URL = http://www.w3.org/TR/xhtml-basic/ PubID = -//W3C//DTD XHTML Basic 1.0//EN SysID = http://www.w3.org/TR/2000/REC-xhtml-basic-20001219/xhtml-basic10.dtd Parse_Mode = XML Allowed = text/html Allowed = application/xhtml+xml Preferred = application/xhtml+xml Allowed = http://www.w3.org/1999/xhtml Required = 1 URI = http://validator.w3.org/images/vxhtml-basic10 Height = 31 Width = 88 Name = html Display = XHTML 1.1 Info_URL = http://www.w3.org/TR/xhtml11/ PubID = -//W3C//DTD XHTML 1.1//EN SysID = http://www.w3.org/TR/2001/REC-xhtml11-20010531/DTD/xhtml11-flat.dtd Parse_Mode = XML Allowed = application/xhtml+xml Forbidden = text/html Preferred = application/xhtml+xml Allowed = http://www.w3.org/1999/xhtml Required = 1 URI = http://www.w3.org/Icons/valid-xhtml11 Height = 31 Width = 88 Name = html Display = XHTML 1.1 + MathML 2.0 Info_URL = http://www.w3.org/TR/xhtml11/ PubID = -//W3C//DTD XHTML 1.1 plus MathML 2.0//EN SysID = http://www.w3.org/Math/DTD/mathml2/xhtml-math11-f.dtd Parse_Mode = XML Allowed = application/xhtml+xml Forbidden = text/html Preferred = application/xhtml+xml Allowed = http://www.w3.org/1999/xhtml Required = 1 URI = http://www.w3.org/Icons/valid-xhtml11 Height = 31 Width = 88 Name = html Display = ISO/IEC 15445:2000 ("ISO HTML") Info_URL = http://purl.org/NET/ISO+IEC.15445/15445.html PubID = ISO/IEC 15445:2000//DTD HTML//EN Parse_Mode = SGML Allowed = text/html Forbidden = application/xhtml+xml Preferred = text/html URI = http://validator.w3.org/images/v15445 # # @@@FIXME: Need to add in SVG, SMIL, etc. mtvalidate-0.5/validator/config/validator.conf0000644000076500000240000000223010551776467021146 0ustar distlerstaff####### User-configurable Options ########################### # # # The SGML Parser to use. Defaults to /usr/bin/onsgmls. SGML_Parser = /usr/local/bin/onsgmls # # Whether to perform an additional XHTML well-formedness check # Defaults to 0 (don't perform the check) # Set to 1 to enable the check # Enabling this feature requires the XML::LibXML Perl module XHTML_Check = 0 # # The SGML Library Path. SGML_Library = plugins/validator/sgml-lib # #################### No user-serviceable parts below ############ # # Email address of the maintainer of this service. Maintainer = akosut@rescomp.stanford.edu # # The "Home Page" for the service. Home_Page = http://validator.w3.org/ # # Base URI To Error Explanations (doc/errors.html) Msg_FAQ_URI = ${Home_Page}docs/errors.html # # Base URI for the Element Reference. Element_Ref_URI = http://www.htmlhelp.com/reference/html40/ # # Mapping tables etc... Include eref.cfg Include frag.cfg Include type.cfg Include charset.cfg Include types.conf mtvalidate-0.5/validator/config/verbose.cfg0000644000076500000240000004557410550343726020445 0ustar distlerstaff25

This is usually a cascading error caused by an undefined entity reference or use of an unencoded ampersand (&) in an URL or body text. See the previous message for further details.

28

Check that you are using a proper syntax for your comments, e.g: <!-- comment here -->. This error may appear if you forget the last "--" to close one comment, therefore including the rest of the content in your comment.

38

Did you forget to close a (double) quote mark?

42

This error may appear if you are using a bad syntax for your comments, such as "<!invalid comment>" The proper syntax for comments is <!-- your comment here -->.

63

You have used character data somewhere it is not permitted to appear. Mistakes that can cause this error include putting text directly in the body of the document without wrapping it in a container element (such as a <p>aragraph</p>) or forgetting to quote an attribute value (where characters such as "%" and "/" are common, but cannot appear without surrounding quotes).

64

The element named above was found in a context where it is not allowed. This could mean that you have incorrectly nested elements -- such as a "style" element in the "body" section instead of inside "head" -- or two elements that overlap (which is not allowed).

One common cause for this error is the use of XHTML syntax in HTML documents. Due to HTML's rules of implicitly closed elements, this error can create cascading effects. For instance, using XHTML's "self-closing" tags for "meta" and "link" in the "head" section of a HTML document may cause the parser to infer the end of the "head" section and the beginning of the "body" section (where "link" and "meta" are not allowed; hence the reported error).

65

The mentioned element is not allowed to appear in the context in which you've placed it; the other mentioned elements are the only ones that are both allowed there and can contain the element mentioned. This might mean that you need a containing element, or possibly that you've forgotten to close a previous element.

One possible cause for this message is that you have attempted to put a block-level element (such as "<p>" or "<table>") inside an inline element (such as "<a>", "<span>", or "<font>").

68
  • You forgot to close a tag, or
  • you used something inside this tag that was not allowed, and the validator is complaining that the tag should be closed before such content can be allowed.

The next message, "start tag was here" points to the particular instance of the tag in question); the positional indicator points to where the validator expected you to close the tag.

69

This is not an error, but rather a pointer to the start tag of the element the previous error referred to.

70

You may have neglected to close a tag, or perhaps you meant to "self-close" a tag; that is, ending it with "/>" instead of ">".

71

This is not an error, but rather a pointer to the start tag of the element the previous error referred to.

73

Most likely, You nested tags and closed them in the wrong order. For example <p><em>...</p> is not acceptable, as <em> must be closed before <p>. Acceptable nesting is: <p><em>...</em></p>

Another possibility is that you used an element which requires a child element that you did not include. Hence the parent element is "not finished", not complete. For instance, <head> generally requires a <title>, lists (ul, ol, dl) require list items (li, or dt, dd), and so on.

76

You have used the element named above in your document, but the document type you are using does not define an element of that name. This error is often caused by:

  • incorrect use of the "Strict" document type with a document that uses frames (e.g. you must use the "Frameset" document type to get the "<frameset>" element),
  • by using vendor proprietary extensions such as "<spacer>" or "<marquee>" (this is usually fixed by using CSS to achieve the desired effect instead).
  • by using upper-case tags in XHTML (in XHTML attributes and elements must be all lower-case.
79

The Validator found an end tag for the above element, but that element is not currently open. This is often caused by a leftover end tag from an element that was removed during editing, or by an implicitly closed element (if you have an error related to an element beeing used where it is not allowed, this is almost certainly the case). In the latter case this error will disappear as soon as you fix the original problem.

82

You have used a character that is not considered a "name character" in an attribute value. Which characters are considered "name characters" varies between the different document types, but a good rule of thumb is that unless the value contains only lower or upper case letters in the range a-z you must put quotation marks around the value. In fact, unless you have extreme file size requirements it is a very very good idea to always put quote marks around your attribute values. It is never wrong to do so, and very often it is absolutely necessary.

105

Check for stray quotes or incomplete attribute definitions.

107

"VI delimiter" is a technical term for the equal sign. This error message means that the name of an attribute and the equal sign cannot be omitted when specifying an attribute. A common cause for this error message is the use of "Attribute Minimization" in document types where it is not allowed, in XHTML for instance.

How to fix: For attributes such as compact, checked or selected, do not write e.g <option selected ... but rather <option selected="selected" ...

108

You have used the attribute named above in your document, but the document type you are using does not support that attribute for this element. This error is often caused by incorrect use of the "Strict" document type with a document that uses frames (e.g. you must use the "Transitional" document type to get the "target" attribute), or by using vendor proprietary extensions such as "marginheight" (this is usually fixed by using CSS to achieve the desired effect instead).

This error may also result if the element itself is not supported in the document type you are using, as an undefined element will have no supported attributes; in this case, see the element-undefined error message for further information.

How to fix: check the spelling and case of the element and attribute, (Remeber XHTML is all lower-case) and/or check that they are both allowed in the chosen document type, and/or use CSS instead of this attribute.

111

Have you forgotten the "equal" sign marking the separation between the attribute and its declared value? Typical syntax is attribute="value".

112

You have specified an attribute more than once. For instance, you have same "img" tag.

120

This error almost always means that you've forgotten a closing quote on an attribute value. For instance, in:

<img src="fred.gif>
<!-- 50 lines of stuff -->
<img src="joe.gif">

The "src" value for the first <img> is the entire fifty lines of stuff up to the next double quote, which probably exceeds the SGML-defined length limit for HTML string literals. Note that the position indicator in the error message points to where the attribute value ended — in this case, the "joe.gif" line.

121

The value of an attribute contained something that is not allowed by the specified syntax for that type of attribute. For instance, the “selected” attribute must be either minimized as “selected” or spelled out in full as “selected="selected"”; the variant “selected=""” is not allowed.

122

It is possible that you violated the naming convention for this attribute. For example, id and name attributes must begin with a letter, not a digit.

123

This attribute can not take a space-separated list of words as a value, but only one word ("token"). This may also be caused by the use of a space for the value of an attribute which does not permit it.

124

The value of this attribute should be a number, and you probably used a wrong syntax.

125

It is possible that you violated the naming convention for this attribute. For example, id and name attributes must begin with a letter, not a digit.

127

The attribute given above is required for an element that you've used, but you have omitted it. For instance, in most HTML and XHTML document types the "type" attribute is required on the "script" element and the "alt" attribute is required for the "img" element.

Typical values for type are type="text/css" for <style> and type="text/javascript" for <script>.

131

The value of the attribute is defined to be one of a list of possible values but in the document it contained something that is not allowed for that type of attribute. For instance, the “selected” attribute must be either minimized as “selected” or spelled out in full as “selected="selected"”; a value like “selected="true"” is not allowed.

137

Check that you are using a proper syntax for your comments, e.g: <!-- comment here -->. This error may appear if you forget the last "--" to close one comment, and later open another.

139

You have used an illegal character in your text. HTML uses the standard UNICODE Consortium character repertoire, and it leaves undefined (among others) 65 character codes (0 to 31 inclusive and 127 to 159 inclusive) that are sometimes used for typographical quote marks and similar in proprietary character sets. The validator has found one of these undefined characters in your document. The character may appear on your browser as a curly quote, or a trademark symbol, or some other fancy glyph; on a different computer, however, it will likely appear as a completely different character, or nothing at all.

Your best bet is to replace the character with the nearest equivalent ASCII character, or to use an appropriate character entity. For more information on Character Encoding on the web, see Alan Flavell's excellent HTML Character Set Issues reference.

This error can also be triggered by formatting characters embedded in documents by some word processors. If you use a word processor to edit your HTML documents, be sure to use the "Save as ASCII" or similar command to save the document without formatting information.

141

An "id" is a unique identifier. Each time this attribute is used in a document it must have a different value. If you are using this attribute as a hock for style sheets it may be more appropriate to use classes (which group elements) than id (which are used to identify exactly one element).

183

This error can be triggered by:

  • A non-existent input, select or textarea element
  • A missing id attribute
  • A typographical error in the id attribute

Try to check the spelling and case of the id you are referring to.

247

The sequence <FOO /> can be interpreted in at least two different ways, depending on the DOCTYPE of the document. For HMTL 4.01 Strict, the '/' terminates the tag <FOO (with an implied '>'). However, since many browsers don't interpret it this way, even in the presence of an HMTL 4.01 Strict DOCTYPE, it is best to avoid it completely in pure HTML documents and reserve its use solely for those written in XHTML.

325

This is usually a cascading error caused by a an undefined entity reference or use of an unencoded ampersand (&) in an URL or body text. See the previous message for further details.

338

An entity reference was found in the document, but there is no reference by that name defined. Often this is caused by misspelling the reference name, unencoded ampersands, or by leaving off the trailing semicolon (;). The most common cause of this error is unencoded ampersands in URLs as described by the WDG in "Ampersands in URLs".

Entity references start with an ampersand (&) and end with a semicolon (;). If you want to use a literal ampersand in your document you must encode it as "&amp;" (even inside URLs!). Be careful to end entity references with a semicolon or your entity reference may get interpreted in connection with the following text. Also keep in mind that named entity references are case-sensitive; &Aelig; and &aelig; are different characters.

If this error appears in some markup generated by PHP's session handling code, this article has explanations and solutions to your problem.

Note that in most documents, errors related to entity references will trigger up to 5 separate messages from the Validator. Usually these will all disappear when the original problem is fixed.

338

An entity reference was found in the document, but there is no reference by that name defined. Often this is caused by misspelling the reference name, unencoded ampersands, or by leaving off the trailing semicolon (;). The most common cause of this error is unencoded ampersands in URLs as described by the WDG in "Ampersands in URLs".

Entity references start with an ampersand (&) and end with a semicolon (;). If you want to use a literal ampersand in your document you must encode it as "&amp;" (even inside URLs!). Be careful to end entity references with a semicolon or your entity reference may get interpreted in connection with the following text. Also keep in mind that named entity references are case-sensitive; &Aelig; and &aelig; are different characters.

Note that in most documents, errors related to entity references will trigger up to 5 separate messages from the Validator. Usually these will all disappear when the original problem is fixed.

344

The checked page did not contain a document type ("DOCTYPE") declaration. The Validator has tried to validate with the HTML 4.01 Transitional DTD, but this is quite likely to be incorrect and will generate a large number of incorrect error messages. It is highly recommended that you insert the proper DOCTYPE declaration in your document -- instructions for doing this are given above -- and it is necessary to have this declaration before the page can be declared to be valid.

387

This may happen if you have consecutive comments but did not close one of them properly. The proper syntax for comments is <!-- my comment -->.

394

If you meant to include an entity that starts with "&", then you should terminate it with ";". Another reason for this error message is that you inadvertently created an entity by failing to escape an "&" character just before this text.

403

This is generally the sign of an ampersand that was not properly escaped for inclusing into an attribute, in a href for example. You will need to escape all instances of '&' into '&amp;'.

404

This message may appear in several cases:

  • You tried to include the "<" character in your page: you should escape it as "&lt;"
  • You used an unescaped ampersand "&": this may be valid in some contexts, but it is recommended to use "&amp;", which is always safe.
  • Another possibility is that you forgot to close quotes in a previous tag.
407

This error may occur when there is a mistake in how a self-closing tag is closed, e.g '.../ >'. The proper syntax is '... />' (note the position of the space).

410

You've included a character reference to a character that is not defined in the document type you've chosen. This is most commonly caused by numerical references to characters from vendor proprietary character repertoires. Often the culprit will be fancy or typographical quote marks from either the Windows or Macintosh character repertoires.

The solution is to reference UNICODE characters instead. A list of common characters from the Windows character repertoire and their UNICODE equivalents can be found in the document "On the use of some MS Windows characters in HTML" maintained by Jukka Korpela <jkorpela@cs.tut.fi>.

mtvalidate-0.5/validator/MTValidate.pl0000755000076500000240000012012710552013652017362 0ustar distlerstaff# MTValidate 0.5 # $Date: 2007/1/7 14:18:33 $ # by Jacques Distler # original by Alexei Kosut # Based on: # W3C MarkUp Validation Service # A CGI script to retrieve and validate a MarkUp file # # Copyright 1995-2002 Gerald Oskoboiny # for additional contributors, see http://dev.w3.org/cvsweb/validator/ # # This source code is available under the license at: # http://www.w3.org/Consortium/Legal/copyright-software # # Id: check,v 1.321 2003/01/03 20:21:55 ville Exp package MT::Plugin::MTValidate; use vars qw( $VERSION ); $VERSION = '0.5'; # We need Perl 5.6.0+. use 5.006; # Movable Type use MT::Template::Context; use MT; eval{ require MT::Plugin;}; unless ($@) { my $plugin = { name => "MTValidate", version => $VERSION, description => "A wrapper around a local copy of the W3C Validator", doc_link => 'http://golem.ph.utexas.edu/~distler/blog/MTValidate.html', init_app => \&get_configs, }; MT->add_plugin(new MT::Plugin($plugin)); } # # Pragmas. use strict 'vars'; use warnings; ############################################################################### #### Load modules. ############################################################ ############################################################################### # # Modules. # # Version numbers given where we absolutely need a minimum version of a given # module (gives nicer error messages). By default, add an empty import list # when loading modules to prevent non-OO or poorly written modules from # polluting our namespace. # use Config::General 2.06 qw(); # Need 2.06 for -SplitPolicy use File::Spec qw(); use HTML::Parser 3.25 qw(); # Need 3.25 for $p->ignore_elements. use HTML::Template 2.6 qw(); use File::Temp qw(tempfile); use Set::IntSpan qw(); use Text::Iconv qw(); use Text::Wrap qw(wrap); use HTML::Entities qw(encode_entities decode_entities); my $MTV_CFG; my $vdir = MT::ConfigMgr->instance->PluginPath .'/validator'; sub get_configs { # # Read Config Files. eval { my %config_opts = (-ConfigFile => "$vdir/config/validator.conf", -MergeDuplicateOptions => 'yes', -SplitPolicy => 'equalsign', -UseApacheInclude => 1, -IncludeRelative => 1, -InterPolateVars => 1, -DefaultConfig => { SGML_Parser => '/usr/bin/onsgmls', SGML_Library => "$vdir/sgml-lib", XHTML_Check => 0, }, ); %$MTV_CFG = Config::General->new(%config_opts)->getall(); }; if ($@) { die <<".EOF."; Couldn't read configuration. The error reported was: '$@' .EOF. } # Make types config indexed by FPI. my $_types = {}; foreach (keys %{$MTV_CFG->{Types}}) { $_types->{$MTV_CFG->{Types}->{$_}->{PubID}} = $MTV_CFG->{Types}->{$_} } $MTV_CFG->{Types} = $_types; # # Make sure onsgmls exists and is executable. unless (-x $MTV_CFG->{SGML_Parser}) { die qq(Configured SGML Parser "$MTV_CFG->{SGML_Parser}" not executable!\n); } } MT::Template::Context->add_conditional_tag(ValidateIfValid => sub { my $ctx = shift; return $ctx->stash('valid'); }); MT::Template::Context->add_conditional_tag(ValidateIfInvalid => sub { my $ctx = shift; return !$ctx->stash('valid'); }); MT::Template::Context->add_container_tag('Validate' => sub { my ($ctx, $args) = @_; # Build the content my $builder = $ctx->stash('builder'); my $tokens = $ctx->stash('tokens'); my $content; defined($content = $builder->build($ctx, $tokens)) or return $ctx->error($ctx->errstr); # # The data structure that will hold all session data. my $File; ################################# # Initialize the datastructure. # ################################# # # Charset data (casing policy: lowercase early). $File->{Charset}->{Use} = ''; # The charset used for validation. $File->{Charset}->{Auto} = ''; # Autodetection using XML rules (Appendix F) $File->{Charset}->{HTTP} = ''; # From HTTP's "charset" parameter. $File->{Charset}->{META} = ''; # From HTML's . $File->{Charset}->{XML} = ''; # From the XML Declaration. # # Misc simple types. $File->{Type} = ''; # # Array (ref) used to store character offsets for the XML report. $File->{Offsets}->[0] = [0, 0]; # The first item isn't used... # # Listrefs. $File->{Lines} = []; # Line numbers for encoding errors. $File->{Warnings} = []; # Warnings... $File->{'Other Namespaces'} = []; # Other (non-root) Namespaces. ########################################################################### #### Generate Template for Result. ######################################## ########################################################################### my $T = HTML::Template->new( filename => "$vdir/templates/result.tmpl", die_on_bad_params => 0, ); my $E = HTML::Template->new( filename => "$vdir/templates/fatal-error.tmpl", die_on_bad_params => 0, ); $T->param(cfg_home_page => $MTV_CFG->{Home_Page}); ######################################### # Populate $File->{Opt} -- CGI Options. # ######################################### # # Set session switches. $File->{Opt}->{'No Attributes'} = 0; # # If ";debug" was given, let it overrule the value from the config file, # regardless of whether it's "0" or "1" (on or off). return $E->output if (&abort_if_error_flagged($File, $E, 0)); # # Get the file and metadata. $File->{Bytes} = $content; $File->{Type} = 'html'; $File->{'Is Upload'} = 1; # # Abort if an error was flagged during initialization. return $E->output if (&abort_if_error_flagged($File, $E, 0)); ########################################################################### #### Output validation results. ########################################### ########################################################################### # # Find the XML Encoding. $File = &find_xml_encoding($File); # # Decide on a charset to use (first part) # if ($File->{Charset}->{XML}) { $File->{Charset}->{Use} = $File->{Charset}->{XML}; } elsif ($File->{Charset}->{Auto} =~ /^utf-16[bl]e$/ && $File->{BOM} == 2) { $File->{Charset}->{Use} = 'utf-16'; } $File->{Content} = &normalize_newlines($File->{Bytes}, exact_charset($File, $File->{Charset}->{Use})); $File->{Content}->[0] = substr $File->{Content}->[0], $File->{BOM}; # remove BOM #### add warning about BOM in UTF-8 # # Try to extract META charset # (works only if ascii-based and reasonably clean before ) $File = &preparse($File); unless ($File->{Charset}->{Use}) { $File->{Charset}->{Use} = $File->{Charset}->{META}; } unless ($File->{Charset}->{Use}) { $File->{'Error Flagged'} = 1; $File->{'Error Message'} = <<".EOF.";

I was not able to extract a character encoding labeling from any of the valid sources for such information. Without encoding information it is impossible to validate the document. The sources I tried are:

  • The XML Declaration.
  • The HTML "META" element.

And I even tried to autodetect it using the algorithm defined in Appendix F of the XML 1.0 Recommendation.

Since none of these sources yielded any usable information, I will not be able to validate this document. Sorry. Please make sure you specify the character encoding in use.

IANA maintains the list of official names for character sets.

.EOF. } # # Abort if an error was flagged while finding the encoding. return $E->output if &abort_if_error_flagged($File, $E, 2|4); # # Check the detected Encoding and transcode. if (&conflict($File->{Charset}->{Use}, 'utf-8')) { $File = &transcode($File); return $E->output if &abort_if_error_flagged($File, $E, 0); } $File = &check_utf8($File); # always check $File = &byte_error($File); # # Abort if an error was flagged during transcoding return $E->output if &abort_if_error_flagged($File, $E, 1); # # Overall parsing algorithm for documents returned as text/html: # # For documents that come to us as text/html, # # 1. check if there's a doctype # 2. if there is a doctype, parse/validate against that DTD # 3. if no doctype, check for an xmlns= attribute on the first element # 4. if there is an xmlns= attribute, check for XML well-formedness # 5. if there is no xmlns= attribute, and no DOCTYPE, punt. # # # Try to extract a DOCTYPE or xmlns. $File = &preparse($File); # # Set document type to XHTML if the DOCTYPE was for XHTML. # Set document type to MathML if the DOCTYPE was for MathML. # This happens when the file is served as text/html $File->{Type} = 'xhtml+xml' if $File->{DOCTYPE} =~ /xhtml/i; $File->{Type} = 'mathml+xml' if $File->{DOCTYPE} =~ /mathml/i; # # Sanity check Charset information and add any warnings necessary. $File = &charset_conflicts($File); # # By default, use SGML catalog file and SGML Declaration. my $catalog = File::Spec->catfile($MTV_CFG->{SGML_Library}, 'sgml.soc'); my @xmlflags = qw( -R -wvalid -wnon-sgml-char-ref -wno-duplicate ); # # Switch to XML semantics if file is XML. if (&is_xml($File)) { $catalog = File::Spec->catfile($MTV_CFG->{SGML_Library}, 'xml.soc'); push(@xmlflags, '-wxml'); } # # Set final command to use. my @cmd = ($MTV_CFG->{SGML_Parser}, '-n', '-D', $MTV_CFG->{SGML_Library}, '-c', $catalog, '-E0', @xmlflags); my $cmd = join(' ',@cmd); # # Set debug info for HTML report. $T->param(is_debug => 0); # # Temporary filehandles. my ($SPIN, $spin ) = tempfile( UNLINK => 1 ); my ($SPOUT, $spout) = tempfile( UNLINK => 1 ); my ($SPERR, $sperr) = tempfile( UNLINK => 1 ); # # Dump file to a temp file for parsing. for (@{$File->{Content}}) { print $SPIN $_, "\n"; } # # Run it through SP, redirecting output to temporary files. system($cmd . "<$spin >$spout 2>$sperr"); close $SPIN; $File = &parse_errors($File, $SPERR, $E); # Parse error output. return $E->output unless $File; close $SPERR; $File->{ESIS} = []; my $elements_found = 0; while (<$SPOUT>) { push @{$File->{'DEBUG'}->{ESIS}}, $_; $elements_found++ if /^\(/; if (/^Axmlns() \w+ (.*)/ or /^Axmlns:([^ ]+) \w+ (.*)/) { if (not $File->{Namespace} and $elements_found == 0 and $1 eq "") { $File->{Namespace} = $2; } $File->{Namespaces}->{$2}++ unless $2 eq $File->{Namespace}; } next if / IMPLIED$/; next if /^ASDAFORM CDATA /; next if /^ASDAPREF CDATA /; chomp; # Removes trailing newlines push @{$File->{ESIS}}, $_; } close $SPOUT; eval {unlink ($spin, $spout, $sperr);}; # just in case # # Check whether the parser thought it was Valid. if ($File->{ESIS}->[-1] =~ /^C$/) { delete $File->{ESIS}->[-1]; $File->{'Is Valid'} = 1; } else { $File->{'Is Valid'} = 0; } # # Extract the Namespaces. $File->{Namespaces} = [map {name => '', uri => $_}, keys %{$File->{Namespaces}}]; # # Set Version to be the FPI initially. $File->{Version} = $File->{DOCTYPE}; # # Extract any version attribute from the ESIS. for (@{$File->{ESIS}}) { next unless /^AVERSION CDATA (.*)/; $File->{Version} = $1; last; } # # Force "XML" if type is an XML type and an FPI was not found. # Otherwise set the type to be the FPI. if (&is_xml($File) and not $File->{DOCTYPE}) { $File->{Version} = 'XML'; } else { $File->{Version} = $File->{DOCTYPE} unless $File->{Version}; } # # Get the pretty text version of the FPI if a mapping exists. if (my $prettyver = $MTV_CFG->{Types}->{$File->{Version}}->{Display}) { $File->{Version} = $prettyver; } else { $File->{Version} = &ent($File->{Version}); } # # Warn about unknown Namespaces. if (&is_xml($File) and $File->{Namespace}) { my $ns = &ent($File->{Namespace}); if (&is_xhtml($File) and $File->{Namespace} ne 'http://www.w3.org/1999/xhtml') { &add_warning( $File, 'Warning:', "Unknown namespace («$ns») for text/html document!" ); } elsif (&is_svg($File) and $File->{Namespace} ne 'http://www.w3.org/2000/svg') { &add_warning( $File, 'Warning:', "Unknown namespace («$ns») for SVG document!" ); } } if (defined $File->{Tentative}) { my $class = ''; $class .= ($File->{Tentative} & 2 ? ' info' :''); $class .= ($File->{Tentative} & 4 ? ' warning' :''); $class .= ($File->{Tentative} & 8 ? ' error' :''); $class .= ($File->{Tentative} & 16 ? ' fatal' :''); unless ($File->{Tentative} == 1) { $File->{Notice} = <<".EOF.";

Please note that you have chosen one or more options that alter the content of the document before validation, or have not provided enough information to accurately validate the document. Even if no errors are reported below, the document will not be valid until you manually make the changes we have performed automatically. Specifically, if you used some of the options that override a property of the document (e.g. the DOCTYPE or Character Encoding), you must make the same change to the source document or the server setup before it can be valid. You will also need to insert an appropriate DOCTYPE Declaration or Character Encoding (the "charset" parameter for the Content-Type HTTP header) if any of those are missing.

.EOF. } } &prep_template($File, $T); if ($ctx->stash('comment_preview')) { $T->param(context => "comment"); } else { $T->param(context => "entry"); } # Hack to overcome shortcomings of onsgmls: # Reparse with an XML parser and see... if ($File->{'Is Valid'} && !($#{$File->{Warnings}} >= 0) && $MTV_CFG->{XHTML_Check}) { use XML::LibXML; my $parser = new XML::LibXML; $parser->line_numbers(1); eval { for (@{$File->{Content}}) { $parser->parse_chunk( $_ . "\n" ); } $parser->parse_chunk("", 1); }; if ($@) { my $messages = $@; # Line-number indicators begin on a new line $messages =~ s{[^\x0d\x0a](:\d+:)}{\n$1}g; # Strip Perl line numbers from error message. $messages =~ s{[^\x0d\x0a]+[\x0d\x0a]$}{}; &add_warning($File, "Error:", "
" . encode_entities(decode_entities($messages)) . "
" ); } } if ($File->{'Is Valid'} && !($#{$File->{Warnings}} >= 0) ) { $T->param(VALID => 1); &report_valid($File, $T); $ctx->stash('valid', 1); } else { $T->param(VALID => 0); $T->param(opt_show_source => 1); $T->param(file_errors => &report_errors($File)); $ctx->stash('valid', 0); } $T->param(file_warnings => $File->{Warnings}); $T->param(file_outline => &outline($File)); $T->param(file_source => &source($File)); $T->param(file_parsetree => &parsetree($File, $T)); # # Thhhhhat's all, folks! return $T->output; }); ############################################################################# # Subroutine definitions ############################################################################# # # Generate HTML report. sub prep_template ($$) { my $File = shift; my $T = shift; # # XML mode... $T->param(is_xml => &is_xml($File)); # # Metadata... $T->param(file_charset => $File->{Charset}->{Use}); $T->param(file_version => $File->{Version}); # # Output options... $T->param(opt_show_source => 0); $T->param(opt_show_outline => 0); $T->param(opt_show_parsetree => 0); $T->param(opt_show_noatt => $File->{Opt}->{'No Attributes'}); $T->param(opt_verbose => 1); # # Namespaces... $T->param(file_namespace => &ent($File->{Namespace})); $T->param(file_namespaces => $File->{Namespaces}) if $File->{Namespaces}; } # # Output "This page is Valid" report. sub report_valid { my $File = shift; my $T = shift; my $gifborder = ' border="0"'; my $xhtmlendtag = ''; my($image_uri, $alttext, $gifhw); unless ($File->{Version} eq 'unknown' or defined $File->{Tentative}) { if (defined $image_uri) { $T->param(have_badge => 1); $T->param(badge_uri => $image_uri); $T->param(badge_alt => $alttext); $T->param(badge_gifhw => $gifhw); $T->param(badge_xhtml => $xhtmlendtag); } } elsif (defined $File->{Tentative}) { $T->param(is_tentative => 1); } } # # Add a waring message to the output. sub add_warning ($$$) {push @{shift->{Warnings}}, {title => shift, text => shift}}; # # Print HTML explaining why/how to use a DOCTYPE Declaration. sub doctype_spiel { return <<".EOF.";

You should place a DOCTYPE declaration as the very first thing in your HTML document. For example, for a typical XHTML 1.0 document:

      <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
      <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
        <head>
          <title>Title</title>
        </head>

        <body>
          <!-- ... body of document ... -->
        </body>
      </html>
    

For XML documents, you may also wish to include an "XML Declaration" even before the DOCTYPE Declaration, but this is not well supported in older browsers. More information about this can be found in the XHTML 1.0 Recommendation.

.EOF. } # # Normalize newline forms (CRLF/CR/LF) to native newline. sub normalize_newlines { my $file = shift; shift; #charset my $pattern = ''; # don't use backreference parentheses! $pattern = '\x00\x0D(?:\x00\x0A)?|\x00\x0A' if /^utf-16be$/; $pattern = '\x0D\x00(?:\x0A\x00)?|\x0A\x00' if /^utf-16le$/; # $pattern = '\x00\x00\x00\x0D(?:\x00\x00\x00\x0A)?|\x00\x00\x00\x0A' if /^UCS-4be$/; # $pattern = '\x0D\x00\x00\x00(?:\x0A\x00\x00\x00)?|\x0A\x00\x00\x00' if /^UCS-4le$/; # insert other special cases here, such as EBCDIC $pattern = '\x0D(?:\x0A)?|\x0A' if !$pattern; # all other cases return [split /$pattern/, $file]; } # # find exact charset from general one (utf-16) # # needed for per-line conversion and line splitting # (BE is default, but this will apply only to HTML) sub exact_charset { my $File = shift; my $general_charset = shift; my $exact_charset = $general_charset; if ($general_charset eq 'utf-16') { if ($File->{Charset}->{Auto} =~ m/^utf-16[bl]e$/) { $exact_charset = $File->{Charset}->{Auto}; } else { $exact_charset = 'utf-16be'; } } # add same code for ucs-4 here return $exact_charset; } # # Return $_[0] encoded for HTML entities (cribbed from merlyn). # # Note that this is used both for HTML and XML escaping. # sub ent { shift; return '' unless defined; # Eliminate warnings s(["<&>"]){'&#' . ord($&) . ';'}ge; # should switch to hex sooner or later return $_; } # # Truncate source lines for report. # # This *really* wants Perl 5.8.0 and it's improved UNICODE support. # Byte semantics are in effect on all length(), substr(), etc. calls, # so offsets will be wrong if there are multi-byte sequences prior to # the column where the error is detected. # sub truncate_line { my $line = shift; my $col = shift; my $start = $col; my $end = $col; for (1..40) { $start-- if ($start - 1 >= 0); # in/de-crement until... $end++ if ($end + 1 <= length $line); # ...we hit end of line. } unless ($end - $start == 80) { if ($start == 0) { # Hit start of line, maybe grab more at end. my $diff = 40 - $col; for (1..$diff) { $end++ if ($end + 1 <= length $line); } } elsif ($end == length $line) { # Hit end of line, maybe grab more at beginning. my $diff = 80 - $col; for (1..$diff) { $start-- if ($start - 1 >= 0); } } } # # Add elipsis at end if necessary. unless ($end == length $line) {substr $line, -3, 3, '...'}; $col = $col - $start; # New offset is diff from $col to $start. $line = substr $line, $start, $end - $start; # Truncate. # # Add elipsis at start if necessary. unless ($start == 0) {substr $line, 0, 3, '...'}; return $line, $col; } # # Parse errors reported by SP. sub parse_errors ($$$) { my $File = shift; my $fh = shift; my $E = shift; $File->{Errors} = []; # Initialize to an (empty) anonymous array ref. for (<$fh>) { push @{$File->{'DEBUG'}->{Errors}}, $_; my($err, @errors); next if /^0:[0-9]+:[0-9]+:[^A-Z]/; next if /numbers exceeding 65535 not supported/; next if /URL Redirected to/; my(@_err) = split /:/; next unless $_err[1] eq '0'; if ($_err[1] =~ m(^)) { @errors = ($_err[0], join(':', $_err[1], $_err[2]), @_err[3..$#_err]); } else { @errors = @_err; } $err->{src} = $errors[1]; $err->{line} = $errors[2]; $err->{char} = $errors[3]; $err->{num} = $errors[4] || ''; (my $rec, $err->{num}) = split /\./, $err->{num}; $err->{type} = $errors[5] || ''; if ($err->{type} eq 'E' or $err->{type} eq 'X' or $err->{type} eq 'Q') { $err->{msg} = encode_entities(decode_entities($errors[6])); } elsif ($err->{type} eq 'W') { my $escapederrormsg = encode_entities(decode_entities($errors[6])); &add_warning( $File, 'Warning:', "Line $err->{line}, column $err->{char}: $escapederrormsg" ); $err->{msg} = $escapederrormsg; } else { $err->{type} = 'I'; $err->{msg} = $errors[4]; $err->{num} = ''; } # Strip curlies from lq-nsgmls output. $err->{msg} =~ s/[{}]//g; # An unknown FPI and no SI. if ($err->{msg} =~ m(cannot generate system identifier for entity) or $err->{msg} =~ m(unrecognized DOCTYPE)i or $err->{msg} =~ m(no document type declaration)i) { $File->{'Error Flagged'} = 1; $File->{'Error Message'} = <<".EOF.";

Fatal Error: $err->{msg}

I could not parse this document, because it uses a public identifier that is not in my catalog.

.EOF. $File->{'Error Message'} .= &doctype_spiel(); $File->{'Error Message'} .= "
\n"; } # No or unknown FPI and a relative SI. if ($err->{msg} =~ m(cannot (open|find))) { $File->{'Error Flagged'} = 1; $File->{'Error Message'} = <<".EOF.";

Fatal Error: $err->{msg}

I could not parse this document, because it makes reference to a system-specific file instead of using a well-known public identifier to specify the type of markup being used.

.EOF. $File->{'Error Message'} .= &doctype_spiel(); $File->{'Error Message'} .= "
\n"; } # No DOCTYPE. if ($err->{msg} =~ m(prolog can\'t be omitted)) { $File->{'Error Flagged'} = 1; $File->{'Error Message'} = <<".EOF.";

Fatal Error: No DOCTYPE specified!

I could not parse this document, because it does not include a DOCTYPE Declaration. A DOCTYPE Declaration is mandatory for most current markup languages and without such a declaration it is impossible to validate this document.

.EOF. $File->{'Error Message'} .= &doctype_spiel(); $File->{'Error Message'} .= <<".EOF.";

The W3C QA Activity maintains a List of Valid Doctypes that you can choose from, and the WDG maintains a document on "Choosing a DOCTYPE".

.EOF. $File->{'Error Message'} .= "
\n"; } return undef if &abort_if_error_flagged($File, $E, 4); push @{$File->{Errors}}, $err; } undef $fh; return $File; } # # Generate a HTML report of detected errors. sub report_errors ($) { my $File = shift; my $Errors = []; # populate a hash with the verbose error messages. my $verbose_file = "$vdir/config/verbose.cfg"; open(FH, "<", $verbose_file) or die "cannot open verbose message file ($!)"; my %verbose_msgs = (); my ($errno, $vmsg); while () { ($errno, $vmsg) = split /\t/; $verbose_msgs{$errno} = $vmsg; } close FH; if (scalar @{$File->{Errors}}) { foreach my $err (@{$File->{Errors}}) { my($line, $col) = &truncate_line($File->{Content}->[$err->{line}-1], $err->{char}); # Strip curlies from lq-nsgmls output. $err->{msg} =~ s/[{}]//g; $err->{msg} =~ s/(^\s|\s\Z)//g; # Remove leading and trailing spaces. # Find index into the %frag hash for the "explanation..." links. $err->{idx} = $err->{msg}; $err->{idx} =~ s/"[^\"]*"/FOO/g; $err->{idx} =~ s/[^A-Za-z ]//g; $err->{idx} =~ s/\s+/ /g; # Collapse spaces $err->{idx} =~ s/(^\s|\s\Z)//g; # Remove leading and trailing spaces. ) $err->{idx} =~ s/(FOO )+/FOO /g; # Collapse FOOs. $err->{idx} =~ s/FOO FOO/FOO/g; # Collapse FOOs. if (exists $verbose_msgs{$err->{num}}) { $err->{verbose} = $verbose_msgs{$err->{num}}; } $line = &ent($line); # Entity encode. $line =~ s/\t/ /g; # Collapse TABs. if (defined $MTV_CFG->{Error_to_URI}->{$err->{idx}}) { $err->{uri} = $MTV_CFG->{Msg_FAQ_URI} . '#' . $MTV_CFG->{Error_to_URI}->{$err->{idx}}; } $err->{src} = $line; $err->{col} = ' ' x $col; push @{$Errors}, $err; } } return $Errors; } # # Produce an outline of the document based on Hn elements from the ESIS. sub outline { my $File = shift; my $outline = ''; my $prevlevel = 0; my $indent = 0; my $level = 0; for (1 .. $#{$File->{ESIS}}) { my $line = $File->{ESIS}->[$_]; next unless $line =~ /^\(H([1-6])$/i; $prevlevel = $level; $level = $1; $outline .= " \n" x ($prevlevel - $level); # perl is so cool. if ($level - $prevlevel == 1) {$outline .= "
    \n"}; foreach my $i (($prevlevel + 1) .. ($level - 1)) { $outline .= qq(
      \n
    • A level $i heading is missing!
    • \n); } if ($level - $prevlevel > 1) {$outline .= "
        \n"}; $line = ''; my $heading = ''; until (substr($line, 0, 3) =~ /^\)H$level/i) { $line = $File->{ESIS}->[$_++]; $line =~ s/\\011/ /g; $line =~ s/\\012/ /g; if ($line =~ /^-/) { my $headcont = $line; substr($headcont, 0, 1) = " "; $headcont =~ s/\\n/ /g; $heading .= $headcont; } elsif ($line =~ /^AALT CDATA( .+)/i) { my $headcont = $1; $headcont =~ s/\\n/ /g; $heading .= $headcont; } } $heading = substr($heading, 1); # chop the leading '-' or ' '. $heading = &ent($heading); $outline .= "
      • $heading
      • \n"; } $outline .= "
      \n" x $level; return $outline; } # # Create a HTML representation of the document. sub source { my $File = shift; my $line = 1; my @source = (); for (@{$File->{Content}}) { push @source, { file_source_i => $line, file_source_line => ent $_, }; $line++; } return \@source; } # # Create a HTML Parse Tree of the document for validation report. sub parsetree ($$) { my ($File, $T) = @_; my $tree = ''; $T->param(file_parsetree_noatt => 1) if $File->{Opt}->{'No Attributes'}; my $indent = 0; my $prevdata = ''; foreach my $line (@{$File->{ESIS}}) { if ($File->{Opt}->{'No Attributes'}) { # don't show attributes next if $line =~ /^A/; next if $line =~ /^\(A$/; next if $line =~ /^\)A$/; } $line =~ s/\\n/ /g; $line =~ s/\\011/ /g; $line =~ s/\\012/ /g; $line =~ s/\s+/ /g; next if $line =~ /^-\s*$/; if ($line =~ /^-/) { substr($line, 0, 1) = ' '; $prevdata .= $line; next; } elsif ($prevdata) { $prevdata = &ent($prevdata); $prevdata =~ s/\s+/ /go; $tree .= wrap(' ' x $indent, ' ' x $indent, $prevdata) . "\n"; undef $prevdata; } $line = &ent($line); if ($line =~ /^\)/) { $indent -= 2; } my $printme; chomp($printme = $line); # $printme =~ s{^([()])(.*)} # reformat and add links on HTML elements # { my $close = ''; # $close = "/" if $1 eq ")"; # ")" -> close-tag # "<" . $close . "{Element_Ref_URI} . $MTV_CFG->{Element_Map}->{lc($2)} . # "\">$2<\/a>>" # }egx; $printme =~ s,^A, A,; # indent attributes a bit $tree .= ' ' x $indent . $printme . "\n"; if ($line =~ /^\(/) { $indent += 2; } } return $tree; } # # Do an initial parse of the Document Entity to extract charset and FPI. sub preparse { my $File = shift; # # Reset DOCTYPE, Root, and Charset (for second invocation). $File->{Charset}->{META} = ''; $File->{DOCTYPE} = ''; $File->{Root} = ''; my $dtd = sub { return if $File->{Root}; ($File->{Root}, $File->{DOCTYPE}) = shift =~ m()si; }; my $start = sub { my $tag = shift; my $attr = shift; my %attr = map {lc($_) => $attr->{$_}} keys %{$attr}; if ($File->{Root}) { if (lc $tag eq 'meta') { if (lc $attr{'http-equiv'} eq 'content-type') { if ($attr{content} =~ m(charset\s*=[\s\"\']*([^\s;\"\'>]*))si) { $File->{Charset}->{META} = lc $1; } } } return unless $tag eq $File->{Root}; } else { $File->{Root} = $tag; } if ($attr->{xmlns}) {$File->{Namespace} = $attr->{xmlns}}; }; my $p = HTML::Parser->new(api_version => 3); $p->xml_mode(1); $p->ignore_elements('BODY'); $p->ignore_elements('body'); $p->handler(declaration => $dtd, 'text'); $p->handler(start => $start, 'tag,attr'); $p->parse(join "\n", @{$File->{Content}}); $File->{DOCTYPE} = '' unless defined $File->{DOCTYPE}; $File->{DOCTYPE} =~ s(^\s+){ }g; $File->{DOCTYPE} =~ s(\s+$){ }g; $File->{DOCTYPE} =~ s(\s+) { }g; # $File->{DOCTYPE} = '-//W3C//DTD XHTML 1.1 plus MathML 2.0//EN'; return $File; } # # Utility subs to tell if type "is" something. sub is_xml {shift->{Type} =~ m(^[^+]+\+xml$)}; sub is_svg {shift->{Type} =~ m(svg\+xml$)}; sub is_xhtml {shift->{Type} =~ m(xhtml\+xml$)}; # # Check charset conflicts and add any warnings necessary. sub charset_conflicts { my $File = shift; # # Handle the case where there was no charset to be found. unless ($File->{Charset}->{Use}) { &add_warning($File, 'No Character Encoding detected!', <<".EOF."); To ensure correct validation, processing, and display, it is important that the character encoding is properly labeled. More information... .EOF. $File->{Tentative} |= 4; } my $cs_use = $File->{Charset}->{Use} ? &ent($File->{Charset}->{Use}) : ''; my $cs_opt = $File->{Opt}->{Charset} ? &ent($File->{Opt}->{Charset}) : ''; my $cs_http = $File->{Charset}->{HTTP} ? &ent($File->{Charset}->{HTTP}) : ''; my $cs_xml = $File->{Charset}->{XML} ? &ent($File->{Charset}->{XML}) : ''; my $cs_meta = $File->{Charset}->{META} ? &ent($File->{Charset}->{META}) : ''; # # Add a warning if there was charset info conflict (HTTP header, # XML declaration, or element). if (&conflict($File->{Charset}->{HTTP}, $File->{Charset}->{XML})) { &add_warning($File, 'Character Encoding mismatch!', <<".EOF."); The character encoding from the HTTP header ($cs_http) is different from the value in the XML declaration ($cs_xml). I will use the value from the HTTP header ($cs_use) for this validation. .EOF. } elsif (&conflict($File->{Charset}->{HTTP}, $File->{Charset}->{META})) { &add_warning($File, 'Character Encoding mismatch!', <<".EOF."); The character encoding from the HTTP header ($cs_http) is different from the value in the <meta> element ($cs_meta). I will use the value from the HTTP header ($cs_use) for this validation. .EOF. } elsif (&conflict($File->{Charset}->{XML}, $File->{Charset}->{META})) { &add_warning($File, 'Character Encoding mismatch!', <<".EOF."); The character encoding from the XML declaration ($cs_xml) is different from the value in the <meta> element ($cs_meta). I will use the value from the XML declaration ($cs_xml) for this validation. .EOF. $File->{Tentative} |= 4; } return $File; } # # Transcode to UTF-8 sub transcode { my $File = shift; my ($command, $result_charset) = split " ", $MTV_CFG->{Charsets}->{$File->{Charset}->{Use}}, 2; $result_charset = exact_charset($File, $result_charset); if ($command eq 'I') { # test if given charset is available eval {my $c = Text::Iconv->new($result_charset, 'utf-8')}; $command = '' if $@; } elsif ($command eq 'X') { $@ = "$File->{Charset}->{Use} undefined; replace by $result_charset"; } if ($command ne 'I') { my $cs = &ent($File->{Charset}->{Use}); $File->{'Error Flagged'} = 1; $File->{'Error Message'} = sprintf(<<".EOF.", $cs, &ent($@));

      Sorry! A fatal error occurred when attempting to transcode the character encoding of the document. Either we do not support this character encoding yet, or you have specified a non-existent character encoding (often a misspelling).

      The detected character encoding was "%s".

      The error was "%s".

      If you believe the character encoding to be valid you can submit a request for that character encoding (see the feedback page for details) and we will look into supporting it in the future.

      .EOF. $File->{'Error Message'} .= <<'.EOF.';

      IANA maintains the list of official names for character sets.

      .EOF. return $File; } my $c = Text::Iconv->new($result_charset, 'utf-8'); my $line = 0; for (@{$File->{Content}}) { my $in = $_; $line++; $_ = $c->convert($_); # $_ is local!! if ($in ne "" and $_ eq "") { push @{$File->{Lines}}, $line; $_ = "#### encoding problem on this line, not shown ####"; } } return $File; } # # Check correctness of UTF-8 both for UTF-8 input and for conversion results sub check_utf8 { my $File = shift; for (my $i = 0; $i < $#{$File->{Content}}; $i++) { # substitution needed for very long lines (>32K), to avoid backtrack # stack overflow. Handily, this also happens to count characters. $_ = unpack 'C*', $File->{Content}->[$i]; # make sure we're doing byte-wise comparison my $count = s/ [\x00-\x7F] # ASCII | [\xC2-\xDF] [\x80-\xBF] # non-overlong 2-byte sequences | \xE0[\xA0-\xBF] [\x80-\xBF] # excluding overlongs | [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2} # straight 3-byte sequences | \xED[\x80-\x9F] [\x80-\xBF] # excluding surrogates | \xF0[\x90-\xBF] [\x80-\xBF]{2} # planes 1-3 | [\xF1-\xF3] [\x80-\xBF]{3} # planes 4-15 | \xF4[\x80-\x8F][\x80-\xBF]{2} # plane 16 //xg; if (length) { push @{$File->{Lines}}, ($i+1); $File->{Content}->[$i] = "#### encoding problem on this line, not shown ####"; $count = 50; # length of above text } $count += 0; # Force numeric. $File->{Offsets}->[$i + 1] = [$count, $File->{Offsets}->[$i]->[1] + $count]; } return $File; } # # byte error analysis sub byte_error { my $File = shift; my @lines = @{$File->{Lines}}; if (scalar @lines) { $File->{'Error Flagged'} = 1; my $s = $#lines ? 's' : ''; my $lines = join ', ', split ',', Set::IntSpan->new(\@lines)->run_list; my $cs = &ent($File->{Charset}->{Use}); $File->{'Error Message'} = <<".EOF.";

      Sorry, I am unable to validate this document because on line$s $lines it contained one or more bytes that I cannot interpret as $cs (in other words, the bytes found are not valid values in the specified Character Encoding). Please check both the content of the file and the character encoding indication.

      .EOF. } return $File; } # # Autodetection as in Appendix F of the XML 1.0 Recommendation. # # # return values are: (base_encoding, BOMSize, Size, Pattern) sub find_base_encoding { shift; # With a Byte Order Mark: return ('ucs-4be', 4, 4, '\0\0\0(.)') if /^\x00\x00\xFE\xFF/; # UCS-4, big-endian machine (1234) return ('ucs-4le', 4, 4, '(.)\0\0\0') if /^\xFF\xFE\x00\x00/; # UCS-4, little-endian machine (4321) return ('utf-16be', 2, 2, '\0(.)') if /^\xFE\xFF/; # UTF-16, big-endian. return ('utf-16le', 2, 2, '(.)\0') if /^\xFF\xFE/; # UTF-16, little-endian. return ('utf-8', 3, 1, '') if /^\xEF\xBB\xBF/; # UTF-8. # Without a Byte Order Mark: return ('ucs-4be', 0, 4, '\0\0\0(.)') if /^\x00\x00\x00\x3C/; # UCS-4 or 32bit; big-endian machine (1234 order). return ('ucs-4le', 0, 4, '(.)\0\0\0') if /^\x3C\x00\x00\x00/; # UCS-4 or 32bit; little-endian machine (4321 order). return ('utf-16be', 0, 2, '\0(.)') if /^\x00\x3C\x00\x3F/; # UCS-2, UTF-16, or 16bit; big-endian. return ('utf-16le', 0, 2, '(.)\0') if /^\x3C\x00\x3F\x00/; # UCS-2, UTF-16, or 16bit; little-endian. return ('utf-8', 0, 1, '') if /^\x3C\x3F\x78\x6D/; # UTF-8, ISO-646, ASCII, ISO-8859-*, Shift-JIS, EUC, etc. return ('ebcdic', 0, 1, '') if /^\x4C\x6F\xA7\x94/; # EBCDIC return ('', 0, 1, ''); # nothing in particular } # # Find encoding in document according to XML rules # Only meaningful if file contains a BOM, or for well-formed XML! sub find_xml_encoding { my $File = shift; my ($CodeUnitSize, $Pattern); ($File->{Charset}->{Auto}, $File->{BOM}, $CodeUnitSize, $Pattern) = &find_base_encoding($File->{Bytes}); my $someBytes = substr $File->{Bytes}, $File->{BOM}, ($CodeUnitSize * 100); my $someText = ''; # 100 arbitrary, but enough in any case # translate from guessed encoding to ascii-compatible if ($File->{Charset}->{Auto} eq 'ebcdic') { # special treatment for EBCDIC, maybe use tr/// # work on this later } elsif (!$Pattern) { $someText = $someBytes; # efficiency shortcut } else { # generic code for UTF-16/UCS-4 $someBytes =~ /^(($Pattern)*)/s; $someText = $1; # get initial piece without chars >255 $someText =~ s/$Pattern/$1/sg; # select the relevant bytes } # try to find encoding pseudo-attribute my $s = '[\ \t\n\r]'; $someText =~ m(^<\?xml $s+ version $s* = $s* ([\'\"]) [-._:a-zA-Z0-9]+ \1 $s+ encoding $s* = $s* ([\'\"]) ([A-Za-z][-._A-Za-z0-9]*) \2 )xso; $File->{Charset}->{XML} = lc $3; return $File; } # # Abort with a message if an error was flagged at point. sub abort_if_error_flagged ($$$) { my ($File, $E, $Flags) = @_; return 0 unless $File->{'Error Flagged'}; &prep_template($File, $E); $E->param(error_message => $File->{'Error Message'}); return 1; } # # conflicting encodings sub conflict ($$) {return $_[0] && $_[1] && ($_[0] ne $_[1])} 1; mtvalidate-0.5/validator/templates/0000755000076500000240000000000010550412776017033 5ustar distlerstaffmtvalidate-0.5/validator/templates/fatal-error.tmpl0000644000076500000240000000026710550343726022152 0ustar distlerstaff
      mtvalidate-0.5/validator/templates/invalid.tmpl0000644000076500000240000000315510550343726021361 0ustar distlerstaff

      Your is not valid !
      Please correct the errors below and resubmit.

      These are the results of checking your comment for XML well-formedness and validity.

      These are the results of attempting to parse your comment with an SGML parser.

      Errors:

      1. Line ">, column : (" target="errors">explain...).
        
        ^

        (]%20New%20Error%20Message%20Suggestion">Send the W3C feedback on error #.)

      mtvalidate-0.5/validator/templates/opt_show_outline.tmpl0000644000076500000240000000104010550343726023323 0ustar distlerstaff

      Outline

      Below is an outline for this document, automatically generated from the heading tags (<h1> through <h6>.)

      If this does not look like a real outline, it is likely that the heading tags are not being used properly. (Headings should reflect the logical structure of the document; they should not be used simply to add emphasis, or to change the font size.)

      mtvalidate-0.5/validator/templates/opt_show_parsetree.tmpl0000644000076500000240000000060410550343726023643 0ustar distlerstaff

      Parse Tree

      I am excluding the attributes, as you requested. You can also view this parse tree without attributes by selecting the appropriate option on the form.

      
          
      mtvalidate-0.5/validator/templates/opt_show_source.tmpl0000644000076500000240000000044610550343726023155 0ustar distlerstaff

      Source Listing

      Below is the source input I used for this validation:

      1. ">
      mtvalidate-0.5/validator/templates/result.tmpl0000644000076500000240000000130710550343726021246 0ustar distlerstaff
      mtvalidate-0.5/validator/templates/table.tmpl0000644000076500000240000000150310550343726021015 0ustar distlerstaff
      Encoding:
      Doctype:
      Root Namespace: ">
      Other Namespaces
      mtvalidate-0.5/validator/templates/tip.tmpl0000644000076500000240000000026510550343726020526 0ustar distlerstaff
      Tip Of The Day:
      ">
      mtvalidate-0.5/validator/templates/valid.tmpl0000644000076500000240000000115210550343726021025 0ustar distlerstaff

      Your is Tentatively valid (Tentatively Valid)!

      Your is valid !
      Click on POST to submit it, or scroll down to edit it.