It’s estimated that data scientists spend 80% of their time converting data into a usable form.
Combining PowerShell’s ability for getting content from a web page and converting text to objects using the new “parse by example” functions is a potent cocktail for scrubbing and making data useable.
Let’s scrape this page for the Presidents of the United States, extract the name, plus starting and ending term. Then, we’ll used PowerShell’s
Add-Member to calculate the
TotalYears in office.
Invoke-WebRequest cmdlet and the
AllElements property gets content of the page. Piping the results to
Where to find the the HTML element where the class attribute is equal to
potus75Width (you can discover this by viewing the source of the web page).
The result of the
Where is piped to
ConvertFrom-String which uses the
$template to parse the data by example. In the template you identify the
TermEnd properties. Notice in the template you can tell
ConvertFrom-String the data type you want the property to be cast to, here you cast
In the last step you pipe the objects that have been converted from text to
ForEach so you can add a
Value of this property subtracts
The PowerShell Code
You pulled down the html from the web page, searched for specific data on that page and passed the results to
ConvertForm-String with a template showing an example of how you want the data extracted. For the final step, you enriched the data by adding a property that calculates the total years in office.
Now that the data has been converted into a usable form, you can easily export it to SQL, CSV, JSON or create an Excel spreadsheet with this PowerShell Excel Module. As a bonus, this PowerShell module can create Excel spreadsheets and Excel does not need to be installed on that machine. Perfect for server side operations.
$data | Export-Excel c:\Temp\potus.xlsx -Show -AutoSize