Programming

Need Help Searching PDFs with PSWritePDF for Specific Text

August 16, 2025

Asked By CuriousCat42 On August 16, 2025

Hey everyone! I'm trying to search through PDFs for specific text occurrences. I downloaded a PDF from VMware and I'm looking to find sentences that mention "esxi". I can convert the PDF into an array of objects, but when I use Select-String to pipe in the object, it just prints out the entire content of the PDF instead of the specific matches I'm looking for. I'm also trying to loop through the pages, but that's returning the same result. Here's what I've got:

```powershell
Import-Module PSWritePDF

$myPDF = Convert-PDFToText -FilePath $file

# $matches = $myPDF | Select-String "esxi" -Context 1

$matches = [System.Collections.Generic.List[string]]::new()
$pages = $myPDF.length
for ($i=0; $i -le $pages; $i++) {
$pageMatches = $myPDF[$i] | Select-String "esxi" -Context 1
foreach ($pageMatch in $pageMatches) {
$matches.Add($pageMatch)
}
}
```

Has anyone tackled something similar? Any tips would be really appreciated!

3 Answers

Answered By PowerScriptWiz On August 19, 2025

You’ve nailed it with your loop structure! The issue is that `Select-String` sees each page in `$myPDF[$i]` as a whole. Consider splitting it first:

```powershell
$lines = $myPDF[$i] -split "`r?`n"
$pageMatches = $lines | Select-String "esxi" -Context 1
```

This lets you find just the matching lines including context. Give it a shot and see if that helps!

Answered By PDFMasterPro On August 17, 2025

I think the issue here is with how the PDF is being converted. The underlying library seems to return a string per page rather than per line. You might want to split each line manually or use a tool like Ghostscript to convert the PDF into a text file that you can manipulate in PowerShell.

Answered By DevDude99 On August 16, 2025

It sounds like you're running into a common issue. Each item in `$myPDF` is a multi-line string for a full page, so when you use `Select-String`, it considers the whole string as one unit. If "esxi" shows up anywhere in that string, it outputs the entire page.

To search line-by-line, you can split each page string into individual lines. Try this:

```powershell
$lines = $myPDF[$i] -split 'r?n'
$pageMatches = $lines | Select-String "esxi" -Context 1
```

This way, you'll get just the matching lines plus their context without dumping the whole page. Just keep in mind that you might lose your page numbers unless you implement a solution to track them as well.

Need Help Searching PDFs with PSWritePDF for Specific Text

3 Answers

Related Questions

How To: Running Codex CLI on Windows with Azure OpenAI

Set Wordpress Featured Image Using Javascript

How To Fix PHP Random Being The Same

Why no WebP Support with Wordpress

Replace Wordpress Cron With Linux Cron

Customize Yoast Canonical URL Programmatically

LEAVE A REPLY Cancel reply