public class COSParser extends BaseParser implements ICOSParser
QuickParser presented in
PDFBOX-1104 by Jeremy Villalobos.| Modifier and Type | Field and Description |
|---|---|
protected static char[] |
EOF_MARKER
EOF-marker.
|
protected long |
fileLen
file length.
|
protected boolean |
initialParseDone |
protected static char[] |
OBJ_MARKER
obj-marker.
|
protected SecurityHandler<? extends ProtectionPolicy> |
securityHandler
The security handler.
|
static String |
SYSPROP_EOFLOOKUPRANGE
The range within the %%EOF marker will be searched.
|
protected XrefTrailerResolver |
xrefTrailerResolver
Collects all Xref/trailer objects and resolves them into single
object using startxref reference.
|
A, ASCII_CR, ASCII_LF, B, D, DEF, document, E, ENDOBJ_STRING, ENDSTREAM_STRING, J, M, N, O, R, S, source, STREAM_STRING, T| Constructor and Description |
|---|
COSParser(org.apache.pdfbox.io.RandomAccessRead source)
Default constructor.
|
COSParser(org.apache.pdfbox.io.RandomAccessRead source,
String password,
InputStream keyStore,
String keyAlias)
Constructor for encrypted pdfs.
|
| Modifier and Type | Method and Description |
|---|---|
protected void |
checkPages(COSDictionary root)
Check if all entries of the pages dictionary are present.
|
org.apache.pdfbox.io.RandomAccessReadView |
createRandomAccessReadView(long startPosition,
long streamLength)
Creates a random access read view starting at the given position with the given length.
|
COSBase |
dereferenceCOSObject(COSObject obj)
Dereference the COSBase object which is referenced by the given COSObject.
|
protected AccessPermission |
getAccessPermission()
This will get the AccessPermission.
|
protected PDEncryption |
getEncryption()
This will get the encryption dictionary.
|
boolean |
isLenient()
Return true if parser is lenient.
|
protected boolean |
isString(char[] string)
Checks if the given string can be found at the current offset.
|
protected int |
lastIndexOf(char[] pattern,
byte[] buf,
int endOff)
Searches last appearance of pattern within buffer.
|
protected COSStream |
parseCOSStream(COSDictionary dic)
This will read a COSStream from the input stream using length attribute within dictionary.
|
protected boolean |
parseFDFHeader()
Parse the header of a fdf.
|
protected COSBase |
parseObjectDynamically(COSObjectKey objKey,
boolean requireExistingNotCompressedObj)
Parse the object for the given object key.
|
protected COSBase |
parseObjectStreamObject(long objstmObjNr,
COSObjectKey key)
Parse the object with the given key from the object stream with the given number.
|
protected boolean |
parsePDFHeader()
Parse the header of a pdf.
|
protected boolean |
parseXrefTable(long startByteOffset)
This will parse the xref table from the stream and add it to the state
The XrefTable contents are ignored.
|
protected void |
prepareDecryption()
Prepare for decryption.
|
protected boolean |
resetTrailerResolver()
Indicates whether the xref trailer resolver should be reset or not.
|
protected COSDictionary |
retrieveTrailer()
Read the trailer information and provide a COSDictionary containing the trailer information.
|
void |
setEOFLookupRange(int byteCount)
Sets how many trailing bytes of PDF file are searched for EOF marker and 'startxref' marker.
|
protected void |
setLenient(boolean lenient)
Change the parser leniency flag.
|
getObjectKey, isClosing, isClosing, isDigit, isDigit, isEndOfName, isEOF, isEOL, isEOL, isSpace, isSpace, isWhitespace, isWhitespace, parseCOSArray, parseCOSDictionary, parseCOSName, parseCOSString, parseDirObject, readExpectedChar, readExpectedString, readGenerationNumber, readInt, readLine, readLong, readObjectNumber, readString, readString, readStringNumber, skipSpaces, skipWhiteSpacespublic static final String SYSPROP_EOFLOOKUPRANGE
protected static final char[] EOF_MARKER
protected static final char[] OBJ_MARKER
protected long fileLen
protected boolean initialParseDone
protected SecurityHandler<? extends ProtectionPolicy> securityHandler
protected XrefTrailerResolver xrefTrailerResolver
public COSParser(org.apache.pdfbox.io.RandomAccessRead source)
throws IOException
source - input representing the pdf.IOException - if something went wrongpublic COSParser(org.apache.pdfbox.io.RandomAccessRead source,
String password,
InputStream keyStore,
String keyAlias)
throws IOException
source - input representing the pdf.password - password to be used for decryption.keyStore - key store to be used for decryption when using public key securitykeyAlias - alias to be used for decryption when using public key securityIOException - if the source data could not be readpublic void setEOFLookupRange(int byteCount)
DEFAULT_TRAIL_BYTECOUNT.
We check that new value is at least 16. However for practical use cases this value should not be lower than 1000; even 2000 was found to not be enough in some cases where some trailing garbage like HTML snippets followed the EOF marker.
In case system property SYSPROP_EOFLOOKUPRANGE is defined this value will be set on initialization but
can be overwritten later.
byteCount - number of trailing bytesprotected COSDictionary retrieveTrailer() throws IOException
IOException - if something went wrongprotected boolean resetTrailerResolver()
protected int lastIndexOf(char[] pattern,
byte[] buf,
int endOff)
pattern - pattern to search forbuf - buffer to search pattern inendOff - offset (exclusive) where lookup starts at-1 if pattern could not be foundpublic boolean isLenient()
protected void setLenient(boolean lenient)
lenient - try to handle malformed PDFs.public COSBase dereferenceCOSObject(COSObject obj) throws IOException
ICOSParserdereferenceCOSObject in interface ICOSParserobj - the COSObject which references the COSBase object to be dereferenced.IOException - if something went wrong when dereferencing the COSBase objectpublic org.apache.pdfbox.io.RandomAccessReadView createRandomAccessReadView(long startPosition,
long streamLength)
throws IOException
ICOSParsercreateRandomAccessReadView in interface ICOSParserstartPosition - start position within the underlying random access readstreamLength - stream lengthIOException - if something went wrong when creating the view for the RandomAccessReadprotected COSBase parseObjectDynamically(COSObjectKey objKey, boolean requireExistingNotCompressedObj) throws IOException
objKey - key of object to be parsedrequireExistingNotCompressedObj - if true the object to be parsed must be defined in xref
(comment: null objects may be missing from xref) and it must not be a compressed object within object stream
(this is used to circumvent being stuck in a loop in a malicious PDF)IOException - If an IO error occurs.protected COSBase parseObjectStreamObject(long objstmObjNr, COSObjectKey key) throws IOException
objstmObjNr - the number of the offset streamkey - the key of the object to be parsedIOException - if something went wrong when parsing the objectprotected COSStream parseCOSStream(COSDictionary dic) throws IOException
dic - dictionary that goes with this stream.IOException - if an error occurred reading the stream, like problems with reading
length attribute, stream does not end with 'endstream' after data read, stream too short etc.protected void checkPages(COSDictionary root) throws IOException
root - the root dictionary of the pdfIOException - if the page tree root is nullprotected boolean isString(char[] string)
throws IOException
string - the bytes of the string to look forIOException - if something went wrongprotected boolean parsePDFHeader()
throws IOException
IOException - if something went wrongprotected boolean parseFDFHeader()
throws IOException
IOException - if something went wrongprotected boolean parseXrefTable(long startByteOffset)
throws IOException
startByteOffset - the offset to start atIOException - If an IO error occurs.protected PDEncryption getEncryption() throws IOException
IOException - If there is an error getting the document.protected AccessPermission getAccessPermission() throws IOException
IOException - If there is an error getting the document.protected void prepareDecryption()
throws IOException
InvalidPasswordException - If the password is incorrect.IOException - if something went wrongCopyright © 2002–2022 The Apache Software Foundation. All rights reserved.