I blogged about visualizing US Congressional Earmarks from the Taxpayers for Common Sense data here. It is an itemized list of Senator and House of Representatives requesting funds. Included in the data are the categories Bill, Bill Section and Bill Subsection the request was made against.
The PowerShell script Do-Analysis does two things. It preps the data, creating nested dictionaries, and provides four functions for discovery Get-Names, Get-Requested, Get-NotRequested and Transform-Data.
Note, the Build-Dataset function processes the data with dimensions present in the original file. Typing Build-Dataset House Section will set up the dataset for answering question “How did the House of Representative vote?”.
Senators first
After dot sourcing the file . .\Do-Analysis. You can look at what Hillary Clinton requested. Get-Requested Clinton here is a sample
Name Value
—- —–
Agriculture 1
Aircraft Procurement 1
Conservation Programs 1
Corps of Engineers 1
Defense Health Program 1
For Senator Ted Stevens Get-Requested Stevens
Name Value
—- —–
Agriculture 1
Bureau of Land Management 1
Conservation Programs 1
Corps of Engineers 1
Defense Health Program 1
And finally Senator Jon Kyl of Arizona
Name Value
—- —–
Bureau of Reclamation 1
Corps of Engineers 1
Department of Health and Human Services 1
Department of Justice 1
Environmental Protection Agency 1
We can also look at what Senator Clinton didn’t request Get-NotRequested Clinton
Name Value
—- —–
Administrative Provisions 0
BRAC Projects 2005 0
Bureau of Indian Affairs 0
Bureau of Land Management 0
Bureau of Reclamation 0
What if we wanted to answer the question who requested funds in the bill section Rural Development Programs? No problem. Transform-Data pivots the nested dictionaries. Now we can use the same Get-Requested function and pass Rural Development Programs as the query. Get-Requested ‘Rural Development Programs’. The results are the Senators who requested monies for this section of the bill.
Name Value
—- —–
Baucus 1
Harkin 1
Johnson 1
Kohl 1
Leahy 1
Lincoln 1
McConnell 1
none 1
Pryor 1
Specter 1
Tester 1
Use Transform-Data again and the dictionaries are back in their original shape.
Deeper Analysis
Shaping the dataset Senator->Bill Section -> 0|1, where 0 indicates no vote and 1 voted for, answers what Senators requested. Transform-Data re-shapes the dictionaries into Bill Section -> Senator -> 0|1 so we can answer who voted for what bill sections.
In follow up posts we’ll look at relationships between the dimensions. Iterating over the data and calculating Pearson correlation coefficients so we can show neighbors and links between bill sections and senators voting patterns.
References
These techniques are from the book Programming Collective Intelligence in Chapter 2 using PowerShell instead of Python.
Jon Udell posted The Congressional content management system. He highlights:
The absurdity of expecting people to make sense of complex texts
And notes that in one of the Congressional PDFs an Xml file name is embedded. The good news, Congress may be using automation to do revisions. The bad news, it is not publicly available.
Download
Here are the two files for download if you want to play with PowerShell and the data.
{ 0 comments… add one now }