Import-Html: Use PowerShell to Export HTML Tables To Excel

Added a new feature to my PowerShell Excel Module, Import-Html. Provide the URL of the page that has the table html tag and the index, starting at 0, identifying the table in the HTML to return.

It locates the table, grabs the headers if present (or creates them on the fly) and finally exports them directly to and Excel spreadsheet.

Get the Module

Grab the module from either the PowerShell Gallery or GitHub.

PowerShell Hashtables – You’re doing them wrong

Update from the PowerShell team:
$h.$key will perform very differently if $key has many unique values

$h.["abc"] or $h.abc will perform roughly the same in a loop.

Both versions add the numbers 1 to 1000 as keys to a hashtable.

The difference? The second version is 100 times faster.

In PowerShell you can access the key of a hashtable with dot notation, the first way or with indexer notation, the second way.

Slight difference in syntax, significant difference in performance.

Note: I found this out when building this Native PowerShell Spelling Corrector – Google Style.

Native PowerShell Spelling Corrector – Google Style

Peter Norvig, Director of Research at Google, posted “How to Write a Spelling Corrector”.

The PowerShell version reads a text file that has ~110K words, creates a lookup dictionary, and produces a list of possible corrections based on the edit distance between two words. See Mr. Norvig’s post for details.

All this in a page of PowerShell script, and it comes back in less than a second.

Invoke-SpellCorrector speling

Ported To PowerShell

I ported this back in 2007, PowerShell v1.0. Grab the script and the homles.txt file from GitHub.

Web Scraping with PowerShell – PDF Files

Often times you want the information that is in a PDF. You want to extract data, munge text for input to another process, or parse and save the results to a database.

Parsing PDF data is a challenge, fortunately there is a great library iTextSharp that does the job for you. You can download it from NuGet Gallery and it’s written for .NET. That means you can use in PowerShell to automate PDF processing.

Get-PDFContent

Here is a script that lets you read a PDF and extract the content as a string.

Dot source the script and you can start reading local PDF files.

Bonus Points

This same script let’s you read a PDF straight from the web.

Grab the PowerShell Script and iTextSharp

The PowerShell script and iTextSharp DLL are here on my GitHub repo.

Web Scraping with PowerShell – CSV Files

Reading CSV Files

PowerShell can work with CSV files from local files or in memory. Sometimes, there are CSV files on the web. You could download the file manually and then use a PowerShell script to process it with Import-Csv. You could also craft a script to download the file and then use Import-Csv.

Or, you could retrieve that file as a string and convert it on the fly with ConvertFrom-Csv. This option is the cleanest, it doesn’t require a local file to be created and when the script finishes, memory is reclaimed. On GitHub I created a CSV file Album List that has a random selection of albums.

You can retrieve and print them like this:

Here is a partial printing:

The Invoke-RestMethod retrieves the text from the target url. It’s then is piped to ConvertFrom-Csv which creates and prints an array of objects with the property names Artist and Name.

For bonus points, grab the PowerShell Excel module on the PowerShell Gallery, or GitHub and create a spreadsheet with columns ready for reading with AutoSize.

image