Package org.w3c.tidy

Class Configuration

  • All Implemented Interfaces:
    java.io.Serializable

    public class Configuration
    extends java.lang.Object
    implements java.io.Serializable
    Read configuration file and manage configuration properties. Configuration files associate a property name with a value. The format is that of a Java .properties file.
    Version:
    $Revision: 817 $ ($Author: steffenyount $)
    Author:
    Dave Raggett dsr@w3.org , Andy Quick ac.quick@sympatico.ca (translation to Java), Fabrizio Giustina
    See Also:
    Serialized Form
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected java.lang.String altText
      default text for alt attribute.
      static int ASCII
      Deprecated. 
      protected boolean asciiChars
      convert quotes and dashes to nearest ASCII char.
      static int BIG5
      Deprecated. 
      protected boolean bodyOnly
      output BODY content only.
      protected boolean breakBeforeBR
      o/p newline before br or not?
      protected boolean burstSlides
      create slides on each h2 element.
      protected java.lang.String cssPrefix
      CSS class naming for -clean option.
      protected int definedTags
      track what types of tags user has defined to eliminate unnecessary searches.
      static int DOCTYPE_AUTO
      treatment of doctype: auto.
      static int DOCTYPE_LOOSE
      treatment of doctype: loose.
      static int DOCTYPE_OMIT
      treatment of doctype: omit.
      static int DOCTYPE_STRICT
      treatment of doctype: strict.
      static int DOCTYPE_USER
      treatment of doctype: user.
      protected int docTypeMode
      see doctype property.
      protected java.lang.String docTypeStr
      user specified doctype.
      protected boolean dropEmptyParas
      discard empty p elements.
      protected boolean dropFontTags
      discard presentation tags.
      protected boolean dropProprietaryAttributes
      discard proprietary attributes.
      protected int duplicateAttrs
      Keep first or last duplicate attribute.
      protected boolean emacs
      if true format error output for GNU Emacs.
      protected boolean encloseBlockText
      if yes text in blocks is wrapped in p's.
      protected boolean encloseBodyText
      if yes text at body is wrapped in p's.
      protected java.lang.String errfile
      file name to write errors to.
      protected boolean escapeCdata
      replace CDATA sections with escaped text.
      protected boolean fixBackslash
      fix URLs by replacing \ with /.
      protected boolean fixComments
      fix comments with adjacent hyphens.
      protected boolean fixUri
      properly escape URLs.
      protected boolean forceOutput
      output document even if errors were found.
      protected boolean hideComments
      hides all (real) comments in output.
      protected boolean hideEndTags
      suppress optional end tags.
      protected boolean htmlOut
      output plain-old HTML, even for XHTML input.
      protected boolean indentAttributes
      newline+indent before each attribute.
      protected boolean indentCdata
      indent CDATA sections.
      protected boolean indentContent
      indent content of appropriate tags.
      static int ISO2022
      Deprecated. 
      protected boolean joinClasses
      join multiple class attributes.
      protected boolean joinStyles
      join multiple style attributes.
      static int KEEP_FIRST
      Keep first duplicate attribute.
      static int KEEP_LAST
      Keep last duplicate attribute.
      protected boolean keepFileTimes
      if yes last modied time is preserved.
      protected java.lang.String language
      RJ language property.
      static int LATIN1
      Deprecated. 
      protected boolean literalAttribs
      if true attributes may use newlines.
      protected boolean logicalEmphasis
      replace i by em and b by strong.
      protected boolean lowerLiterals
      folds known attribute values to lower case.
      static int MACROMAN
      Deprecated. 
      protected boolean makeBare
      Make bare HTML: remove Microsoft cruft.
      protected boolean makeClean
      remove presentational clutter.
      protected boolean ncr
      allow numeric character references.
      protected char[] newline
      bytes for the newline marker.
      protected boolean numEntities
      use numeric entities.
      protected boolean onlyErrors
      if true normal output is suppressed.
      protected boolean quiet
      no 'Parsing X', guessed DTD or summary.
      protected boolean quoteAmpersand
      output naked ampersand as &.
      protected boolean quoteMarks
      output " marks as ".
      protected boolean quoteNbsp
      output non-breaking space as entity.
      static int RAW
      Deprecated.
      use Tidy.setRawOut(true) for raw output
      protected boolean rawOut
      Avoid mapping values > 127 to entities.
      protected boolean replaceColor
      replace hex color attribute values with names.
      protected java.lang.String replacementCharEncoding
      char encoding used when replacing illegal SGML chars, regardless of specified encoding.
      protected Report report
      Report instance.
      static int SHIFTJIS
      Deprecated. 
      protected int showErrors
      number of errors to put out.
      protected boolean showWarnings
      however errors are always shown.
      protected java.lang.String slidestyle
      Deprecated.
      does nothing
      protected boolean smartIndent
      does text/block level content effect indentation.
      protected int spaces
      default indentation.
      protected int tabsize
      default tab size (8).
      protected boolean tidyMark
      add meta element indicating tidied doc.
      protected boolean trimEmpty
      trim empty elements.
      protected TagTable tt
      TagTable associated with this Configuration.
      protected boolean upperCaseAttrs
      output attributes in upper not lower case.
      protected boolean upperCaseTags
      output tags in upper not lower case.
      static int UTF16
      Deprecated. 
      static int UTF16BE
      Deprecated. 
      static int UTF16LE
      Deprecated. 
      static int UTF8
      Deprecated. 
      static int WIN1252
      Deprecated. 
      protected boolean word2000
      draconian cleaning for Word2000.
      protected boolean wrapAsp
      wrap within ASP pseudo elements.
      protected boolean wrapAttVals
      wrap within attribute values.
      protected boolean wrapJste
      wrap within JSTE pseudo elements.
      protected int wraplen
      default wrap margin (68).
      protected boolean wrapPhp
      wrap within PHP pseudo elements.
      protected boolean wrapScriptlets
      wrap within JavaScript string literals.
      protected boolean wrapSection
      wrap within CDATA section tags.
      protected boolean writeback
      if true then output tidied markup.
      protected boolean xHTML
      output extensible HTML.
      protected boolean xmlOut
      create output as XML.
      protected boolean xmlPi
      add <?xml?> for XML docs.
      protected boolean xmlPIs
      If set to yes PIs must end with ?>.
      protected boolean xmlSpace
      if set to yes adds xml:space attr as needed.
      protected boolean xmlTags
      treat input as XML.
    • Constructor Summary

      Constructors 
      Modifier Constructor Description
      protected Configuration​(Report report)
      Instantiates a new Configuration.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods 
      Modifier and Type Method Description
      void addProps​(java.util.Properties p)
      adds configuration Properties.
      void adjust()
      Ensure that config is self consistent.
      protected java.lang.String convertCharEncoding​(int code)
      Convert a char encoding from the deprecated tidy constant to a standard java encoding name.
      protected java.lang.String getInCharEncodingName()
      Getter for inCharEncodingName.
      protected java.lang.String getOutCharEncodingName()
      Getter for outCharEncodingName.
      static boolean isKnownOption​(java.lang.String name)
      Is the given String a valid configuration flag?
      void parseFile​(java.lang.String filename)
      Parses a property file.
      void printConfigOptions​(java.io.Writer errout, boolean showActualConfiguration)
      prints available configuration options.
      protected void setInCharEncoding​(int encoding)
      Deprecated.
      use setInCharEncodingName(String)
      protected void setInCharEncodingName​(java.lang.String encoding)
      Setter for inCharEncodingName.
      protected void setInOutEncodingName​(java.lang.String encoding)
      Setter for inOutCharEncodingName.
      protected void setOutCharEncoding​(int encoding)
      Deprecated.
      use setOutCharEncodingName(String)
      protected void setOutCharEncodingName​(java.lang.String encoding)
      Setter for outCharEncodingName.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • RAW

        public static final int RAW
        Deprecated.
        use Tidy.setRawOut(true) for raw output
        character encoding = RAW.
        See Also:
        Constant Field Values
      • ASCII

        public static final int ASCII
        Deprecated.
        character encoding = ASCII.
        See Also:
        Constant Field Values
      • LATIN1

        public static final int LATIN1
        Deprecated.
        character encoding = LATIN1.
        See Also:
        Constant Field Values
      • UTF8

        public static final int UTF8
        Deprecated.
        character encoding = UTF8.
        See Also:
        Constant Field Values
      • ISO2022

        public static final int ISO2022
        Deprecated.
        character encoding = ISO2022.
        See Also:
        Constant Field Values
      • MACROMAN

        public static final int MACROMAN
        Deprecated.
        character encoding = MACROMAN.
        See Also:
        Constant Field Values
      • UTF16LE

        public static final int UTF16LE
        Deprecated.
        character encoding = UTF16LE.
        See Also:
        Constant Field Values
      • UTF16BE

        public static final int UTF16BE
        Deprecated.
        character encoding = UTF16BE.
        See Also:
        Constant Field Values
      • UTF16

        public static final int UTF16
        Deprecated.
        character encoding = UTF16.
        See Also:
        Constant Field Values
      • WIN1252

        public static final int WIN1252
        Deprecated.
        character encoding = WIN1252.
        See Also:
        Constant Field Values
      • BIG5

        public static final int BIG5
        Deprecated.
        character encoding = BIG5.
        See Also:
        Constant Field Values
      • SHIFTJIS

        public static final int SHIFTJIS
        Deprecated.
        character encoding = SHIFTJIS.
        See Also:
        Constant Field Values
      • DOCTYPE_OMIT

        public static final int DOCTYPE_OMIT
        treatment of doctype: omit.
        See Also:
        Constant Field Values
        To do:
        should be an enumeration DocTypeMode
      • DOCTYPE_AUTO

        public static final int DOCTYPE_AUTO
        treatment of doctype: auto.
        See Also:
        Constant Field Values
      • DOCTYPE_STRICT

        public static final int DOCTYPE_STRICT
        treatment of doctype: strict.
        See Also:
        Constant Field Values
      • DOCTYPE_LOOSE

        public static final int DOCTYPE_LOOSE
        treatment of doctype: loose.
        See Also:
        Constant Field Values
      • DOCTYPE_USER

        public static final int DOCTYPE_USER
        treatment of doctype: user.
        See Also:
        Constant Field Values
      • KEEP_LAST

        public static final int KEEP_LAST
        Keep last duplicate attribute.
        See Also:
        Constant Field Values
        To do:
        should be an enumeration DupAttrMode
      • KEEP_FIRST

        public static final int KEEP_FIRST
        Keep first duplicate attribute.
        See Also:
        Constant Field Values
      • spaces

        protected int spaces
        default indentation.
      • wraplen

        protected int wraplen
        default wrap margin (68).
      • tabsize

        protected int tabsize
        default tab size (8).
      • docTypeMode

        protected int docTypeMode
        see doctype property.
      • duplicateAttrs

        protected int duplicateAttrs
        Keep first or last duplicate attribute.
      • altText

        protected java.lang.String altText
        default text for alt attribute.
      • slidestyle

        protected java.lang.String slidestyle
        Deprecated.
        does nothing
        style sheet for slides.
      • language

        protected java.lang.String language
        RJ language property.
      • docTypeStr

        protected java.lang.String docTypeStr
        user specified doctype.
      • errfile

        protected java.lang.String errfile
        file name to write errors to.
      • writeback

        protected boolean writeback
        if true then output tidied markup.
      • onlyErrors

        protected boolean onlyErrors
        if true normal output is suppressed.
      • showWarnings

        protected boolean showWarnings
        however errors are always shown.
      • quiet

        protected boolean quiet
        no 'Parsing X', guessed DTD or summary.
      • indentContent

        protected boolean indentContent
        indent content of appropriate tags.
      • smartIndent

        protected boolean smartIndent
        does text/block level content effect indentation.
      • hideEndTags

        protected boolean hideEndTags
        suppress optional end tags.
      • xmlTags

        protected boolean xmlTags
        treat input as XML.
      • xmlOut

        protected boolean xmlOut
        create output as XML.
      • xHTML

        protected boolean xHTML
        output extensible HTML.
      • htmlOut

        protected boolean htmlOut
        output plain-old HTML, even for XHTML input. Yes means set explicitly.
      • xmlPi

        protected boolean xmlPi
        add <?xml?> for XML docs.
      • upperCaseTags

        protected boolean upperCaseTags
        output tags in upper not lower case.
      • upperCaseAttrs

        protected boolean upperCaseAttrs
        output attributes in upper not lower case.
      • makeClean

        protected boolean makeClean
        remove presentational clutter.
      • makeBare

        protected boolean makeBare
        Make bare HTML: remove Microsoft cruft.
      • logicalEmphasis

        protected boolean logicalEmphasis
        replace i by em and b by strong.
      • dropFontTags

        protected boolean dropFontTags
        discard presentation tags.
      • dropProprietaryAttributes

        protected boolean dropProprietaryAttributes
        discard proprietary attributes.
      • dropEmptyParas

        protected boolean dropEmptyParas
        discard empty p elements.
      • fixComments

        protected boolean fixComments
        fix comments with adjacent hyphens.
      • trimEmpty

        protected boolean trimEmpty
        trim empty elements.
      • breakBeforeBR

        protected boolean breakBeforeBR
        o/p newline before br or not?
      • burstSlides

        protected boolean burstSlides
        create slides on each h2 element.
      • numEntities

        protected boolean numEntities
        use numeric entities.
      • quoteMarks

        protected boolean quoteMarks
        output " marks as ".
      • quoteNbsp

        protected boolean quoteNbsp
        output non-breaking space as entity.
      • quoteAmpersand

        protected boolean quoteAmpersand
        output naked ampersand as &.
      • wrapAttVals

        protected boolean wrapAttVals
        wrap within attribute values.
      • wrapScriptlets

        protected boolean wrapScriptlets
        wrap within JavaScript string literals.
      • wrapSection

        protected boolean wrapSection
        wrap within CDATA section tags.
      • wrapAsp

        protected boolean wrapAsp
        wrap within ASP pseudo elements.
      • wrapJste

        protected boolean wrapJste
        wrap within JSTE pseudo elements.
      • wrapPhp

        protected boolean wrapPhp
        wrap within PHP pseudo elements.
      • fixBackslash

        protected boolean fixBackslash
        fix URLs by replacing \ with /.
      • indentAttributes

        protected boolean indentAttributes
        newline+indent before each attribute.
      • xmlPIs

        protected boolean xmlPIs
        If set to yes PIs must end with ?>.
      • xmlSpace

        protected boolean xmlSpace
        if set to yes adds xml:space attr as needed.
      • encloseBodyText

        protected boolean encloseBodyText
        if yes text at body is wrapped in p's.
      • encloseBlockText

        protected boolean encloseBlockText
        if yes text in blocks is wrapped in p's.
      • keepFileTimes

        protected boolean keepFileTimes
        if yes last modied time is preserved.
      • word2000

        protected boolean word2000
        draconian cleaning for Word2000.
      • tidyMark

        protected boolean tidyMark
        add meta element indicating tidied doc.
      • emacs

        protected boolean emacs
        if true format error output for GNU Emacs.
      • literalAttribs

        protected boolean literalAttribs
        if true attributes may use newlines.
      • bodyOnly

        protected boolean bodyOnly
        output BODY content only.
      • fixUri

        protected boolean fixUri
        properly escape URLs.
      • lowerLiterals

        protected boolean lowerLiterals
        folds known attribute values to lower case.
      • replaceColor

        protected boolean replaceColor
        replace hex color attribute values with names.
      • hideComments

        protected boolean hideComments
        hides all (real) comments in output.
      • indentCdata

        protected boolean indentCdata
        indent CDATA sections.
      • forceOutput

        protected boolean forceOutput
        output document even if errors were found.
      • showErrors

        protected int showErrors
        number of errors to put out.
      • asciiChars

        protected boolean asciiChars
        convert quotes and dashes to nearest ASCII char.
      • joinClasses

        protected boolean joinClasses
        join multiple class attributes.
      • joinStyles

        protected boolean joinStyles
        join multiple style attributes.
      • escapeCdata

        protected boolean escapeCdata
        replace CDATA sections with escaped text.
      • ncr

        protected boolean ncr
        allow numeric character references.
      • cssPrefix

        protected java.lang.String cssPrefix
        CSS class naming for -clean option.
      • replacementCharEncoding

        protected java.lang.String replacementCharEncoding
        char encoding used when replacing illegal SGML chars, regardless of specified encoding.
      • tt

        protected TagTable tt
        TagTable associated with this Configuration.
      • report

        protected Report report
        Report instance. Used for messages.
      • definedTags

        protected int definedTags
        track what types of tags user has defined to eliminate unnecessary searches.
      • newline

        protected char[] newline
        bytes for the newline marker.
      • rawOut

        protected boolean rawOut
        Avoid mapping values > 127 to entities.
    • Constructor Detail

      • Configuration

        protected Configuration​(Report report)
        Instantiates a new Configuration. This method should be called by Tidy only.
        Parameters:
        report - Report instance
    • Method Detail

      • addProps

        public void addProps​(java.util.Properties p)
        adds configuration Properties.
        Parameters:
        p - Properties
      • parseFile

        public void parseFile​(java.lang.String filename)
        Parses a property file.
        Parameters:
        filename - file name
      • isKnownOption

        public static boolean isKnownOption​(java.lang.String name)
        Is the given String a valid configuration flag?
        Parameters:
        name - configuration parameter name
        Returns:
        true if the given String is a valid config option
      • adjust

        public void adjust()
        Ensure that config is self consistent.
      • printConfigOptions

        public void printConfigOptions​(java.io.Writer errout,
                                       boolean showActualConfiguration)
        prints available configuration options.
        Parameters:
        errout - where to write
        showActualConfiguration - print actual configuration values
      • getInCharEncodingName

        protected java.lang.String getInCharEncodingName()
        Getter for inCharEncodingName.
        Returns:
        Returns the inCharEncodingName.
      • setInCharEncodingName

        protected void setInCharEncodingName​(java.lang.String encoding)
        Setter for inCharEncodingName.
        Parameters:
        encoding - The inCharEncodingName to set.
      • getOutCharEncodingName

        protected java.lang.String getOutCharEncodingName()
        Getter for outCharEncodingName.
        Returns:
        Returns the outCharEncodingName.
      • setOutCharEncodingName

        protected void setOutCharEncodingName​(java.lang.String encoding)
        Setter for outCharEncodingName.
        Parameters:
        encoding - The outCharEncodingName to set.
      • setInOutEncodingName

        protected void setInOutEncodingName​(java.lang.String encoding)
        Setter for inOutCharEncodingName.
        Parameters:
        encoding - The CharEncodingName to set.
      • setOutCharEncoding

        protected void setOutCharEncoding​(int encoding)
        Deprecated.
        use setOutCharEncodingName(String)
        Setter for outCharEncoding.
        Parameters:
        encoding - The outCharEncoding to set.
      • setInCharEncoding

        protected void setInCharEncoding​(int encoding)
        Deprecated.
        use setInCharEncodingName(String)
        Setter for inCharEncoding.
        Parameters:
        encoding - The inCharEncoding to set.
      • convertCharEncoding

        protected java.lang.String convertCharEncoding​(int code)
        Convert a char encoding from the deprecated tidy constant to a standard java encoding name.
        Parameters:
        code - encoding code
        Returns:
        encoding name