Released v1.17.0 of The Pdfalyzer, the surprisingly popular tool for analyzing (possibly malicious) PDFs I created after my own unpleasant experience. Now ships with two command line tools for extracting stuff from PDF files:
1. extract_text_from_pdfs() - brute force extract all text from a PDF, including doing an #OCR extraction of any embedded images
2. extract_pdf_pages() - rip a page range from a #PDF and write them to a new one
* Github: https://github.com/michelcrypt4d4mus/pdfalyzer
* Pypi: https://pypi.org/project/pdfalyzer/
* Homebrew: https://formulae.brew.sh/formula/pdfalyzer
* Fun thread someone made last week using Pdfalyzer to explain some of how byzantine the PDF format is: https://x.com/VikParuchuri/status/1965773078585344215