Projects like the Mobility Lab’s real-time screens and Transit Near Me can help riders and boost transit usage, but they can only show information for agencies which provide open data. How do our region’s agencies stack up?
The table below lists the many transit agencies in the Washington region and their open data progress. In a nutshell, there are 2 kinds of open data: schedule data and real-time arrival data.
General Transit Feed Specification (GTFS) files list schedules and the locations of stops and routes, powering applications like making maps or trip planners. Real-time arrival data lets applications tell riders how far away the bus actually is, for tools like smartphone apps or digital screens.
|Schedule data||Real-time data|
|Public GTFS||Shapes in GTFS||On Google||Tracking||Tracking API|
|DASH (Alexandria)||Via email only3|
|Ride On (Montgomery)|
|The Bus (Prince George’s)|
|MTA (Maryland) commuter bus|
|Fairfax (County) Connector|
|Loudoun County Transit|
|Mix of GPS & manual7|
What the columns mean
Creating public GTFS feeds (the 1st column) allows someone who’s written an app to easily incorporate schedule and route data for a transit agency. GTFS has emerged as a national standard for representing transit feeds, and there’s tremendous value in having as many agencies as possible support the same standard. That way, if someone writes an app in Chicago, they can make it work in Denver, Albany, or Miami at the same time.
Most of the transit agencies’ feeds including the paths that the vehicles take, but some do not, like DASH. The 2nd column shows this information. Feeds without paths are still usable, but apps that visualize routes, like Transit Near Me, end up showing unsightly diagonal lines cutting across city blocks.
Agencies can also sign a contract with Google to have their routes and schedules on Google Maps. The 3rd column shows agencies which have done this. Some agencies put out their data files, but aren’t willing to sign this contract because of indemnification or other clauses which Google unfortunately insists upon. On the flip side, some agencies sign up with Google but then don’t publish the GTFS feed publicly.
The agency might provide it to those who ask, or might not, but this dissuades app creators from including this agency, and makes it harder for them to get regular updates. Every agency should strive to host a public and up-to-date GTFS feed on their site so that anyone building apps can easily incorporate that agency’s services into the tool.
The other type of open data is real-time locations or predictions. To make this possible, agencies first have to deploy AVL (Automatic Vehicle Location) technology on their buses or trains (the 4th column). The main obstacle is that this is somewhat expensive; a physical device has to go into each vehicle, and those devices then need some amount of maintenance over time.
Once an agency has tracking, it’s relatively simple to offer a computer interface for apps to access and tell riders about this information (the 5th column). Most of the agencies with tracking offer such an interface, but while Ride On, MARC, and Loudoun Transit all have public tracking sites that provide some services to riders, but no way for other apps to tap into the information those sites contain.
What agencies can do
Agencies with red X’s on this chart can start thinking about how to provide schedule and/or real-time open data. Creating GTFS files isn’t extremely difficult, though it does require some staff time to actually do it. For agencies that use scheduling software, the manufacturers of that software often offer modules to export data as GTFS as well.
Some GTFS feeds could benefit from quality fixes. For example, WMATA’s Metrorail GTFS file doesn’t show the specific paths trains take, and paths are missing for a few bus routes. The “Transparent Metro Data Sets” Application Programming Interface (API), a special interface WMATA created to offer access to much of its data, does include the correct paths. But many people develop apps to access GTFS files for multiple cities. It’s much less likely they will put in extra development effort to specifically pull just these route shapes from this unique API.
The Circulator’s routes are part of the WMATA GTFS feed, which makes things even easier for apps than having to download a separate feed. One problem is that the route names are all cryptic: there’s “DCDGR” for the Dupont-Georgetown-Rosslyn Circulator, or “DC98” for the route which replaced the former 98 bus. Those are fine for internal systems inside the agencies, but they aren’t very clear to riders.
Agencies which have provided their data to Google but don’t offer the feeds publicly (like DASH, Ride On, and MARC) should post those feeds on their websites and publicly link to the feeds. They are already creating the GTFS files for Google, so it’s a trivial step to also let others download the same files.
WMATA also has much of the route data for other local bus systems in the region as well, which it uses in its trip planner. Agencies which don’t have GTFS files can give WMATA permission to include their data in its GTFS feed, as the Circulator does.
Agencies with AVL systems already on their vehicles should set up APIs to give apps access to the locations or predictions, and agencies without AVL can work toward getting the budget necessary to deploy AVL.
What others can do
Transit industry associations and vendors which sell technology to transit agencies can all encourage open data to be part of any contract. Vendors can encourage agencies to open their data and provide services to do so, and associations can encourage agencies to ask their vendors for these services.
The industry can also help move toward a clear standard for bus tracking. GTFS has become a standard for schedule and route data because large numbers of agencies went ahead and offered GTFS files. But there is not yet a consensus around what format to use to offer real-time predictions.
WMATA built its own API which provides the data in a certain format. Circulator, The Bus, and CUE all use Nextbus for tracking, which has its own API. ART uses another service, Connexionz. This unfortunately means that anyone building a real-time application and wants to incorporate multiple services has to support at least 3 different APIs.
There are efforts to create such standards, like GTFS-Realtime, but this hasn’t realized the same widespread adoption as GTFS, nor has any other standard.
It’s still possible to build apps without a standard, and the Mobility Lab’s real-time screen project does connect to all 3 different systems in our region. But that requires extra work, not just for the Mobility Lab but for every other app creator who wants to offer predictions for multiple transit agencies.
The easier we make it to build apps, the more we’ll get. Ultimately, it would be great for one standard to emerge, and for the various vendors like Nextbus to agree to all offer data to apps in that same standard format.
Update: Commenter intermodal commuter pointed out the real-time status page for VRE. It combines some train positions from GPS and some from manual reports from conductors. There is not an API to access the data. I’ve corrected the chart.
Update 2: Commenter Adam noted that MARC is actually contained in the MTA Maryland GTFS file, but listed only as routes 300, 301, and 302, which we didn’t realize were not commuter buses upon examining the feed. But you can see the MARC lines on Transit Near Me (for example, center around Union Station).
Also, ACCS Web Manager Joe Chapline posted a status update about ART’s efforts to get into Google Transit; according to Chapline, this was delayed for a time due to contract issues, and now is awaiting action by the Google legal department, which I know from past personal experience is often understaffed and backlogged.