|
|||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--org.openxml.parser.BaseParser
Implements layer 0, layer 1 and some layer 2 parsing methods, in addition to error reportingand logging, and mode access.
readChar()
, and pushed back to be
re-read with pushBack()
. The last character read and the next character
to be pushed back are contained in the variable _curChar
. The value
EOF
indicates that the end of the input stream.
The method setEncoding(java.lang.String)
can be used to change the character encoding
mid-stream, but is effective only in the input stream is of type XMLStreamReader
. The method close()
closes the parser once all input
has been parsed and isClosed()
returns true afterwards.
_tokenText
. The value
in _tokenText
is replaced each time one of these methods is called.
readTokenMarkup()
reads and returns the markup token that follows the
'<' sign; the '<' has already been consumed. readTokenEntity()
reads and returns the general entity reference or character reference that
follows the '&' sign; the '&' has already been consumed. readTokenPERef()
reads and returns the parameter entity reference that follows
the '%' sign; the '%' sign has already been consumed. If a valid token is not
recognized, the sign ('<', '&' or '%') is returned as the token code
TOKEN_TEXT
.
The method readTokenName()
reads and returns a valid token name.
The characters that consitute a valid token name are defined by isNamePart(int, boolean)
. The method canReadName(java.lang.String)
attempts to read and consume
the specified name.
In addition the following convenience methods are defined: isSpace(int)
identifies a whitespace character; isTokenAllSpace()
returns true if
a token is all whitespace characters; slicePITokenText()
slices a
processing instruction into target name and instruction; readTokenQuoted()
reads a quoted (single or double) string and returns it
as token text.
ContentParser
and DTDParser
are also defined. parseGeneralEntity(org.openxml.dom.EntityImpl)
parses a general
entity using an new parser instance. parseDocumentDecl(boolean)
parses
the document declaration found in both XML documents, external subsets and
external entities.
getLastException()
returns the last exception issued.
Expections are stored in a LIFO order and can be retrieved by calling
SAXException#getPrevious
recursively on each exception.
An error is issued by calling one of the error(int, java.lang.String)
methods, either
storing the exception or throwing a SAXException
, depending on the
severity level.
The error methods are defined as an ErrorSinkHandler
interface, allowing
the definition of an external error sink. Typically the document parser serves
as an error sink for entity parsers (see setErrorSink(org.openxml.parser.ErrorSinkHandler)
).
isMode(short)
identifies which processing mode is in effect.
The processing mode is controlled by the constructor.
Parser
,
SAXException
,
XMLStreamReader
Field Summary | |
protected int |
_curChar
Holds the last character read by readChar() , or the next character
to be pushed back (see pushBack() ). |
protected Document |
_document
The XML/HTML/DTD document being processed. |
protected org.openxml.util.FastString |
_tokenText
Holds the contents of the last token read by one of the token reading methods. |
protected static char |
CR
Carriage return. |
protected static int |
EOF
Indicates that end of file (or the input stream) has been reached and no more character are availble. |
protected static char |
LF
Line feed. |
protected static char |
SPACE
Space. |
protected static short |
TOKEN_CDATA
CDATA section token. |
protected static short |
TOKEN_CLOSE_TAG
Close tag token. |
protected static short |
TOKEN_COMMENT
Comment token. |
protected static short |
TOKEN_DTD
DTD token. |
protected static short |
TOKEN_ENTITY_REF
Entity reference token. |
protected static short |
TOKEN_EOF
End of input. |
protected static short |
TOKEN_OPEN_TAG
Open tag token. |
protected static short |
TOKEN_PE_REF
Parameter entity reference. |
protected static short |
TOKEN_PI
Processing instruction token. |
protected static short |
TOKEN_SECTION
DTD section token. |
protected static short |
TOKEN_SECTION_END
DTD section token end. |
protected static short |
TOKEN_TEXT
Textual token. |
Constructor Summary | |
protected |
BaseParser(java.io.Reader reader,
java.lang.String sourceURI,
short mode,
short stopAtSeverity)
Parser constructor. |
Method Summary | |
protected void |
advanceLineNumber(int increment)
Advances the line number count by the specified increment. |
protected boolean |
canReadName(java.lang.String name)
Returns true if the specified name can be read and consumes it all. |
protected void |
close()
Closes the input stream. |
void |
error(int errorLevel,
java.lang.String message)
|
void |
fatalError(java.lang.Exception except)
|
int |
getColumnNumber()
|
ErrorHandler |
getErrorHandler()
Returns the error handler associated with this parser (if any). |
ErrorReport |
getErrorReport()
Returns the error report facility associated with this parser (if any). |
SAXParseException |
getLastException()
Error reporting and logging, and source location methods |
int |
getLineNumber()
|
Locator |
getLocator()
|
protected short |
getMode()
Returns the current parsing mode. |
java.lang.String |
getPublicId()
|
protected java.io.Reader |
getReader()
Returns the reader used for accessing the underlying input stream. |
int |
getSourcePosition()
|
java.lang.String |
getSourceURI()
Deprecated. Use getSystemId() instead. |
java.lang.String |
getSystemId()
|
protected boolean |
isClosed()
Returns true if the document has been fully parsed and the parsed has been closed. |
protected boolean |
isMode(short mode)
Returns true if the specified parsing mode is in effect. |
protected boolean |
isNamePart(int ch,
boolean first)
Returns true if character is part of a valid name. |
protected boolean |
isSpace(int ch)
Returns true if character is a whitespace. |
protected boolean |
isTokenAllSpace()
Returns true if the token is all whitespace. |
protected boolean |
parseDocumentDecl(boolean XMLDecl)
Parses the document declaration for XML documents and external entities, returning the standalone status and changing the character encoding (if necessary). |
protected org.openxml.dom.EntityImpl |
parseGeneralEntity(org.openxml.dom.EntityImpl entity)
Parses the general entity, returning the entity as parsed. |
protected void |
pushBack()
Push back the last character read into _curChar . |
protected void |
pushBack(int ch)
Push back a single character. |
protected int |
readChar()
Reads and returns a single character from the input stream. |
protected int |
readTokenEntity()
Reads general entity reference token or character reference. |
protected int |
readTokenMarkup()
Reads markup token. |
protected boolean |
readTokenName()
Reads a valid token name and places it in _tokenText . |
protected int |
readTokenPERef()
Reads parameter entity reference token. |
protected boolean |
readTokenQuoted()
Reads the quoted identifier token. |
protected void |
setEncoding(java.lang.String encoding)
Changes the encoding of the input stream. |
void |
setErrorHandler(ErrorHandler handler)
Associates this parser with an error handler. |
void |
setErrorSink(ErrorSinkHandler errorSink)
Deprecated. Use setErrorHandler(org.xml.sax.ErrorHandler) instead |
protected java.lang.String |
slicePITokenText()
Slices processing instruction text into target and instruction code. |
void |
warning(java.lang.String message)
|
Methods inherited from class java.lang.Object |
clone,
equals,
finalize,
getClass,
hashCode,
notify,
notifyAll,
toString,
wait,
wait,
wait |
Field Detail |
protected static final int EOF
protected static final char LF
protected static final char CR
protected static final char SPACE
protected static final short TOKEN_EOF
protected static final short TOKEN_TEXT
_tokenText
contains the plain text. This token
is generally used to construct a Text
node when
appearing in the content.protected static final short TOKEN_ENTITY_REF
_tokenText
contains the entity name.
This token is generally used to construct a EntityReference
node.protected static final short TOKEN_OPEN_TAG
_tokenText
contains the tag name. This token
is generally used to construct a Element
node.
Only the tag name is read, the attributes and terminating '>' should
be read separately (see ContentParser.parseAttributes(org.w3c.dom.Element, boolean)
).protected static final short TOKEN_CLOSE_TAG
_tokenText
contains the tag name. This token
is generally used to construct a Element
node.
The entire closing tag has been consumed.protected static final short TOKEN_COMMENT
_tokenText
contains the comment text (if in mode
Parser.MODE_STORE_COMMENT
). This token is generally used to construct
a Comment
node.protected static final short TOKEN_PI
_tokenText
contains the processing
instruction (if in mode Parser.MODE_STORE_PI
). This token is generally
used to construct a ProcessingInstruction
node.protected static final short TOKEN_CDATA
_tokenText
contains the CDATA contents.
This token is generally used to construct a CDATASection
node.protected static final short TOKEN_DTD
_tokenText
contains the DTD entity type (whatever comes
after '
protected static final short TOKEN_SECTION
TOKEN_CDATA
would have been
returned. This token is valid only in the external DTD subset.protected static final short TOKEN_SECTION_END
TOKEN_CDATA
token. This token is valid only in the external DTD.protected static final short TOKEN_PE_REF
_tokenText
contains the entity name.
This token is valid only in the DTD.protected int _curChar
readChar()
, or the next character
to be pushed back (see pushBack()
). Set to EOF
if end
of input stream has been reached.readChar()
protected org.openxml.util.FastString _tokenText
TOKEN_EOF
if end of input stream has been reached.
Many methods read, modify and possibly return values in _tokenText
,
so its value should not be assumed to remain constant between method calls.
A StringBuffer
is allocated and constantly reused by resetting
its length to zero. In some instances, it is replaced with an alternative
StringBuffer
object. IT IS IMPORTANT that no other variable will
reference this StringBuffer
.
ContentParser.readTokenContent()
protected Document _document
Constructor Detail |
protected BaseParser(java.io.Reader reader, java.lang.String sourceURI, short mode, short stopAtSeverity)
Reader
object and textual identifier. The parsing mode consists of
a combination of MODE_.. flags. The constructor specifies the error
severity level at which to stop parsing, either Parser.STOP_SEVERITY_FATAL
,
Parser.STOP_SEVERITY_VALIDITY
or Parser.STOP_SEVERITY_WELL_FORMED
.reader
- Any Reader
from which entity text can be readsourceURI
- URI of entity sourcemode
- The parsing mode in effectstopAtSeverity
- Severity level at which to stop parsingMethod Detail |
protected final org.openxml.dom.EntityImpl parseGeneralEntity(org.openxml.dom.EntityImpl entity) throws SAXException
EntityImpl
is passed to the method. On exit,
the same entity (parsed) is returned, or null to indicate that the entity
could not be parsed.
The following rules govern how the entity is parsed:
EntityImpl.STATE_PARSED
, then the
entity has been parsed before, and is returned.
EntityImpl.STATE_NOT_FOUND
, then
the entity could not be found, and null is returned. There is no need
to issue an error again.
EntityImpl.STATE_PARSING
, then the
entity is being parsed: this is a circular reference, an error is issued
and null is returned.
EntityImpl.STATE_DECLARED
, then the
entity is being parsed. For an external entity, the entity source is being
located using HolderFinder
. If the entity source could
not be found or could not be opened, the entity state is set to EntityImpl.STATE_NOT_FOUND
, an error is issued and null returned.
For an internal entity, the entity source is created from it's value.
EntityImpl.STATE_DECLARED
and the
entity source could be located, an XMLParser
is created and used
to parse the entity. If no fatal errors are encountered when parsing,
the entity is returned. Well formed errors are treated as if generated
by the current parser.
EntityImpl.STATE_DECLARED
and a fatal
error was issued while parsing the entity with an XMLParser
, then
a fatal error is issued and an exception raised.
entity
- The entity to parseprotected final boolean parseDocumentDecl(boolean XMLDecl) throws SAXException
The document declaration is contained in a processing instruction that
appears at the very beginning of the document or entity and begins with
'xml' (case sensitive). The processing instruction's full text is expected
in the variable _tokenText
on entry.
The declaration for XML documents contains a version number, optional character encoding and optional standalone status. The default standalone status is false. The declaration for external entities and external subsets contains an optional version number, and mandatory character encoding.
Currently only XML version "1.0" is supported. The current character
encoding is changed by calling setEncoding(java.lang.String)
.
XMLDecl
- True if expecting XML document declaration, false if expecting
external entity/subset declarationprotected final int readTokenMarkup() throws SAXException, java.io.IOException
_tokenText
. The preceding '<' has been consumed prior to calling this
method. No valid character is held in _curChar
on entry or exit.
The following rules govern how tokens are parsed and which code is returned:
TOKEN_OPEN_TAG
returned for opening tag. Opening tag is '<'
immediately followed by valid tag name (returned as token text) and
optional whitespace. Attributes and terminating '>' are not read by
this method. A whitespace between the '<' and tag name is not allowed
TOKEN_CLOSE_TAG
returned for closing tag. Closing tag is '</'
followed by valid tag name (returned as token text) and '>'. All text
following the tag name until the terminating '>' is ignored; a whitespace
between the '<' and tag name is not allowed; an empty tag name will be
returned.
TOKEN_COMMENT
returned for comment. Comment is terminated with
'<!--' and '-->'. All text inbetween is consumed, and returned as
token text if in mode Parser.MODE_STORE_COMMENT
.
TOKEN_CDATA
returned for CDATA section. Section starts with
'<![CDATA[' and ends with ']]>'. All text inbetween is consumed and
returned as token text.
TOKEN_PI
returned for processing instruction. Processing
instruction is terminated with '<?' and '?>'. All text inbetween is
consumed, and returned as token text if in mode Parser.MODE_STORE_PI
.
TOKEN_DTD
returned for DTD declaration. DTD declaration starts
with '<!' immediately followed by a token name (returned as token text).
All other declaration contents is not read by this method. A whitespace
between the '<!' and the token name is not allowed, and the token name
is all uppercase letters.
TOKEN_SECTION
returned for DTD conditional section. Conditional
section begins with '<![' and is not a CDATA section. Only the '<!['
sequence is read and consumed by this method.
TOKEN_TEXT
is returned, with
'<' contained in _tokenText
and the input stream is not affected.TOKEN_TEXT
protected final int readTokenEntity() throws SAXException, java.io.IOException
TOKEN_ENTITY_REF
and the entity name in _tokenText
. The preceding '&' has been consumed prior to calling this
method, and the trailing ';' is consumed by this method. No valid character
is held in _curChar
on entry or exit.
If no valid entity name is found, the token code TOKEN_TEXT
is
returned, with '&' contained in _tokenText
and the input
stream is not affected.
A '#' sign indicates a character reference (either decimal or hexadecimal)
which is read and stored in _tokenText
, and the token code TOKEN_TEXT
is returned. If the character reference value is invalid,
the token code TOKEN_TEXT
is returned, with '&' contained in
_tokenText
and the input stream is not affected.
If the entity reference or character reference is not terminated with a ';', a well-formed error is issued, but the entity reference is still regarded valid.
TOKEN_ENTITY_REF
or TOKEN_TEXT
protected final int readTokenPERef() throws SAXException, java.io.IOException
TOKEN_PE_REF
and the entity name in _tokenText
. The preceding
'%' has been consumed prior to calling this method, and the trailing ';'
is consumed by this method. No valid character is held in _curChar
on entry or exit.
If no valid entity name is found, the token code TOKEN_TEXT
is
returned, with '%' contained in _tokenText
and the input stream
is not affected.
If the entity reference is not terminated with a ';', a well-formed error is issued, but the entity reference is still regarded valid.
TOKEN_PE_REF
or TOKEN_TEXT
protected final java.lang.String slicePITokenText() throws SAXException
_tokenText
,
returning the valid target name, and _tokenText
truncated to
contain just the instruction code. If no valid target name is found,
an empty name is returned.protected final boolean readTokenQuoted() throws SAXException, java.io.IOException
_tokenText
. Returns true if a quoted value
was found (i.e. opening quote followed on the input stream).protected final boolean readTokenName() throws java.io.IOException
_tokenText
. If a valid
name can be read, it is placed in _tokenText
and true is returned,
otherwise false is returned and the input stream is not affected.
A valid token name is defined as consisting of any letter, underscore or colon, followed by zero or more letters and digits, underscores, hyphens, colons and periods. Unlike other languages, letters and digits can be specified in all Unicode supported languages.
_curChar
does not contain a valid value on either entry or exit
from this method.
protected final boolean canReadName(java.lang.String name) throws java.io.IOException
protected final boolean isTokenAllSpace()
_tokenText
and all its characters must be whitespace as defined
by isSpace(int)
._tokenText
is all whitespaceprotected final boolean isNamePart(int ch, boolean first)
Valid names are defined as consisting of any letter, underscore or colon, followed by zero or more letters and digits, underscores, hyphens, colons and periods. Unlike other languages, letters and digits can be specified in all Unicode supported languages.
ch
- The character to testfirst
- True if first letter in the nameprotected final boolean isSpace(int ch)
ch
- The character to checkprotected final int readChar() throws java.io.IOException
EOF
is returned. The
returned character is also available in the _curChar
variable.
Line breaks (LF, CR and CR+LF) are returned as a single line feed (0x0A) character.
_curChar
protected final void pushBack()
_curChar
. The pushed back
character will be returned when readChar()
is called next. Any number
of characters can be pushed back. The push back buffer is a LIFO stack,
so text should be pushed back in reverse order. It is not an error to push
back the value EOF
.protected final void pushBack(int ch)
readChar()
is called next. Any number of characters can be pushed
back. The push back buffer is a LIFO stack, so text should be pushed back in
reverse order. It is not an error to push back the value EOF
.ch
- The character to push backprotected final void setEncoding(java.lang.String encoding)
XMLStreamReader
and will do nothing
otherwise. Nothing happens if the encoding is not recognized.protected final java.io.Reader getReader()
protected final void close()
protected final boolean isClosed()
close()
method.protected final void advanceLineNumber(int increment)
DTDParser
is used to read the
internal subset in an XML document.increment
- The line number incrementpublic SAXParseException getLastException()
public final int getLineNumber()
public final int getSourcePosition()
public final int getColumnNumber()
public final java.lang.String getSystemId()
public final java.lang.String getPublicId()
public final Locator getLocator()
public final void warning(java.lang.String message) throws SAXException
public final void fatalError(java.lang.Exception except) throws SAXException
public final void error(int errorLevel, java.lang.String message) throws SAXException
public final void setErrorHandler(ErrorHandler handler)
ErrorReport
for reporting errors.
Some applications may wish to provide their own error handler,
by calling this method. If the handler is set to null, all errors
encountered in the code will throw an exception and stop the parser.handler
- The new error handler to use, or nullErrorHandler
,
ErrorReport
public final ErrorHandler getErrorHandler()
ErrorReport
.ErrorHandler
public final ErrorReport getErrorReport()
ErrorHandler
public final void setErrorSink(ErrorSinkHandler errorSink)
setErrorHandler(org.xml.sax.ErrorHandler)
insteadpublic final java.lang.String getSourceURI()
getSystemId()
instead.protected final boolean isMode(short mode)
mode
- The mode(s) to checkprotected final short getMode()
|
|||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |