GraphQL is one of the most exciting things I’ve worked on
in a long time
Why GraphQL is exciting:
Wraps existing REST endpoints with minimal code
Reduces demands on API teams to create aggregate endpoints
Creates a single source of truth for data
Enables mock responses and local development
Comes with browser-based tools for exploring data and testing queries
This promised solutions to a lot of our pain
How IBM Cloud is built:
Node µ-service architecture
30+ µ-service teams
Each µ-service (“plugin”) is a separate codebase
Teams control their own workflow
This has its downsides:
Things can change in 30+ directions at any given time
Front-ends need data from multiple µ-services
Internal documentation & architecture is inconsistent
Code can be wildly inconsistent between µ-services
GraphQL has solutions:
Changes are centralized in the GraphQL µ-service
Data access happens through a single endpoint
Documentation is centralized and consistent
Cleaner separation between data and presentation
I wanted to start using it in production immediately
Not everyone was on board
There are complications:
Who “owns” the GraphQL µ-service?
How can teams make independent changes?
Can one bad commit take down the whole service?
Doesn’t an extra layer make it harder to trace errors?
We wanted the benefits of GraphQL… but could we afford the trade-offs?
We needed answers
Can we...
Centralize data, but let teams keep control?
Design an approach that improves error handling?
Make it so easy teams want to switch?
Build a service that can handle IBM’s scale?
Challenge #1:
Centralize Data, but Decentralize Control
The ideal solution:
Each team maintains their own GraphQL schema... but that schema is aggregated by a central µ-service.
If this was going to work, we needed a standardized format
for sharing schemas.
We call these Data Sources
Each data source is an independent GitHub repo, which means:
✅ No bottlenecks Each team commits and deploys code independently.
✅ No loss of control Each team owns their data source.
✅ No accidental borking Each team’s code has individual test suites.
How do we combine the data sources?
Challenge #2:
Improve Error Handling
What Makes an Error Helpful?
Clear description of what went wrong
Clarity about where the error occurred
GraphQL errors vs. underlying data access issues
Information to help with tracing bugs
Unique IDs shared on the client and server side
Client-Side Errors in Development
In production, we can’t show some data
Docs link may be behind our firewall
Target endpoint may not be public
Client-Side Errors in Production
docsLink and targetEndpoint are removed in production.
Client and server errors share a GUID
Error: Could not load the given xkcd comic
(178460c1-c8d7-42c2-ba0e-f617afb5d3fd)
Description: Could not load the given xkcd comic
Error Code: XKCDModel_Error
GraphQL Model: XKCDModel
Target Endpoint: https://xkcd.com/2000/info.0.json
Documentation: https://ibm.biz/gramps-data-source-tutorial
Data: {
"id": "2000"
}
Using the GUID, we can find the docsLink and
targetEndpoint in the server logs.
This Means...
Errors are normalized across all data sources
Support tickets can directly reference details in logs
Errors are clear and come with documentation
The source of a given error is immediately clear
Implementation Is Optional and Easy
import Express from 'express';
import bodyParser from 'body-parser';
import { graphqlExpress } from 'apollo-server-express';
import { grampsExpress } from '@gramps/gramps-express';
import schemaOne from '@gramps/data-source-one';
import schemaTwo from '@gramps/data-source-two';
const app = new Express();
app.use(bodyParser.json());
app.use(grampsExpress({ dataSources: [ schemaOne, schemaTwo ] }));
app.use('/graphql',
graphqlExpress(req => ({
schema: req.gramps.schema,
context: req.gramps.context,
+ formatError: req.gramps.formatError,
})),
);
Challenge #3:
Make Development So Easy Teams Want to Use It
If we wanted teams to start using GraphQL, we needed it to be
dead simple to get started
We created a data source starter kit:
Strong starting point for new data sources
Step-by-step tutorial for building a new data source
$ gramps --live
============================================================
GrAMPS is running in live mode on port 8080
GraphiQL: http://localhost:8080/graphiql
============================================================
But there was a snag:
“How do we run a local instance of the GraphQL µ-service if the data
source we’re developing is already installed? Won’t they collide?”
Okay, sure, but how long did it take to get GraphQL into production?”
We started working on the GraphQL
µ-service in May
It hit production in July
After the dust settled, we realized two things
#1
“Holy shit, if everyone wrote their data sources using this format, the
dev community could share GraphQL data sources as easily as we share
npm packages.”