Amtrak’s data could help riders and employees — too bad it’s so hard to find

Amtrak train in Plano, Illinois by railsr4me licensed under Creative Commons.

This article originally appeared in Mobility Lab.

Amtrak is a valuable resource for the transportation ecosystem – especially in the Northeast – but few people know that.

Opening its data could help Amtrak tell its story and provide context for delays and service disruptions, creating transparency that could build support for the system. While Amtrak is working towards this, its data isn’t publicly available yet.

Transportation Techies – a Washington, DC-based meet-up group of transportation data nerds – met to share their travails in manipulating and visualizing Amtrak’s information, which turned out to be no simple task compared to the myriad projects people are able to pursue with local bicycle, pedestrian, and transit data.

Turns out, it’s hard to play with Amtrak’s data.

Sunny Zheng walked the room through the process of unwrapping Amtrak’s publicly-facing data feed on tracking real-time train positions. Zheng wanted to build a web tool to track train positions throughout the network, but instead encountered an obfuscated data feed, which was intentionally hidden through code.

Through multiple layers of reverse engineering messy lines of code, Zheng uncovered comments from Amtrak’s developer laughing about misleading outside coders and hiding its data feeds.

Chris Juckins and a colleague built a similar tool to track on-time performance since, on occasion, the trains don’t run on time. Because there is no direct way to feed data into an on-time tracker, the tool uses Amtrak’s Train Status page to feed an on-time performance database. This allows Juckins to create visuals of how trains behave over their entire route, and even to track the historical reliability of any specific train.

Juckins has already envisioned a number of uses for just this set of data. Passengers and crews could better plan for train arrivals with more accurate arrival calculations, particularly along long-haul routes, and customers looking to buy multiple segments can calculate the odds of successfully transferring based on the likelihood of trains arriving on time.

These tools have proven to be helpful: actual Amtrak field crews want this information, according to Zheng and Juckins. Stationmasters along long-haul routes have reached out to use the tool for updating their arrival estimates, and maintenance crews have used the data to track down specific problem areas. It’s telling, both that crews seek outside sources for information, and that it is so difficult for these coders to use the information.

Don Varley, Amtrak’s manager of safety analytics, shared a visualization of trespassing incidents across Amtrak’s national network, which showed both the hugeness of the problem and how difficult it can be to collect this data.

Trespassing incidents are increasing year after year, and Amtrak is trying to figure out how to reverse this trend. Following this in real-time would be difficult, though, since about 25 percent of the data is incorrectly entered the first time around – when mapped, one can find incident coordinates spread across the globe – and therefore needs constant vigilance and revision to get right. Overall, though, Varley is using the data to work with municipalities to address the biggest causes of these incidents.

Michael Schade reprised his Flickr geotag tracker from an earlier Bike Hack Night to visualize where people tag Amtrak on the photos they upload. Showing as a heat map based on the density of tags, one can follow where people are talking about Amtrak, and what they’re saying about it, through photos. Apart from some errant tags across the District, far from any train tracks, the heat map illuminated the Amtrak network and, more importantly, showed how people enjoy traveling by train on the long-haul routes.

The photo heat map also illuminates, in a way, Schade’s opening remarks in which he presented his Amtrak Open Data Manifesto. In it, Schade calls for an open data portal (like Capital Bikeshare‘s), a published General Transit Feed Specification, published passenger counts, and an API for trains, fares, capacity, availability, stations, and schedules. There’s so much that coders have been able to get from Amtrak’s data just in this state – just imagine what could be done if it were open to the public.

Thanks for reading!