(TFL) is a statutory physique accountable for London’s public transport community, managing buses, the Underground, Docklands Gentle Railway, Overground, and main roads. Their ‘Open Knowledge’ coverage implies that they share a lot of their inside knowledge with the general public, which they are saying is presently powering over 600 apps for Londoners.
One fascinating knowledge supply they share with the general public is Santander Cycle (additionally recognized colloquially as Boris Bikes) utilization knowledge. Each bike journey is recorded. This knowledge goes again from 2015 all the way in which as much as 2025. The info is organized in unwieldy weekly CSV information to obtain: https://cycling.data.tfl.gov.uk/#!usage-stats%2F. Every row of this knowledge is one bike journey, with every bike journey ranging from a selected bike station. This equals 9.2 million station-hours, 800 bike stations, 144 weekly CSVs. See an instance of the info beneath.
| Begin Date | StartStation Title | Finish Date | EndStation Title | Period |
|:-----------------|:---------------------------------|:-----------------|:------------------------------------|-----------:|
| 10/01/2016 00:00 | Drury Lane, Covent Backyard | 10/01/2016 00:04 | Frith Avenue, Soho | 240 |
| 10/01/2016 00:00 | Pott Avenue, Bethnal Inexperienced | 10/01/2016 00:05 | Victoria Park Highway, Hackney Central | 300 |
| 10/01/2016 00:00 | Harrington Sq. 2, Camden City | 10/01/2016 00:20 | Baylis Highway, Waterloo | 1200 |
| 10/01/2016 00:01 | Canton Avenue, Poplar | 10/01/2016 00:14 | Hewison Avenue, Previous Ford | 780 |
| 10/01/2016 00:01 | Cephas Avenue, Bethnal Inexperienced | 10/01/2016 00:11 | Brick Lane Market, Shoreditch | 600 |
We are able to take every row and mixture this knowledge up so we are able to see the seasonality traits throughout a number of years:
This dataset now offers us a glimpse into the bike utilization throughout London (this knowledge doesn’t comprise each bike journey in London, however we are able to anticipate that Boris Bike utilization is said to general bike utilization). For a Causal Knowledge Science fanatic, the pure subsequent query is: how can we use this dataset to reply some fascinating causal questions? What occasions happen which have a big affect on cycle journeys? What are some widespread giant scale disruptions that trigger folks to not have the ability to take the tube? How do staff present the worth of their labour to their employers by withholding it? Strikes!
On this article I might be inspecting the causal affect of main tube strikes on cycle utilization in London. Historic strikes are considerably laborious to pin down throughout the web, however fortunately for me there is a FOI into strike action, which provides us dates of strike motion at a line degree, between 2014-18.
As the info begins off as one row for each bike journey throughout all bike stations throughout London, we’ve got some work to do to get right into a format we are able to use. We have now 144 weekly CSVs that we convert to parquet’s to assist with reminiscence constraints. We then mix all these parquet information into collectively one huge dataframe and group by bike station and hour.
| station_id | trips_start | ts |
|-------------:|:--------------------|-----:|
| 1 | 2016-01-10 09:00:00 | 4 |
| 1 | 2016-01-10 10:00:00 | 1 |
| 1 | 2016-01-10 11:00:00 | 2 |
| 1 | 2016-01-10 12:00:00 | 2 |
| 1 | 2016-01-10 13:00:00 | 2 |
TFL additionally present coordinates for every bike station. We be part of on the coordinates to their corresponding H3 cell. H3 is a hexagonal grid system that’s utilized by Uber and is beneficial for a lot of spatial evaluation duties. The plot beneath reveals how bike journeys are distributed throughout London.

We are able to now mixture the journey knowledge as much as H3 cell-day degree together with some confounders that we predict additionally have an effect on biking utilization in London. These embody climate and seasonality options.
# Course of in chunks to keep away from reminiscence spike
chunk_size = 100_000
h3_cells = []
for i in vary(0, len(bf), chunk_size):
chunk = bf.iloc[i:i+chunk_size]
h3_cells.prolong([h3.latlng_to_cell(lat, lon, 8) for lat, lon in zip(chunk["lat"], chunk["lon"])])
print(f" Processed {min(i+chunk_size, len(bf)):,} / {len(bf):,}")
bf["h3_cell"] = h3_cells
# Combination to cell-day
bf["day"] = pd.to_datetime(bf["trips_start"]).dt.date
cell_day = (
bf.groupby(["h3_cell", "day"])
.agg(
total_trips = ("ts", "sum"),
frac_exposed = ("strike_exposed", "imply"),
n_stations = ("station_id", "nunique"),
temperature_2m = ("temperature_2m", "imply"),
precipitation = ("precipitation", "imply"),
is_weekend = ("is_weekend", "first"),
is_bank_holiday = ("is_bank_holiday", "first"),
is_school_holiday = ("is_school_holiday", "first"),
days_to_next_strike = ("days_to_next_strike", "first"),
days_since_last_strike= ("days_since_last_strike", "first"),
month = ("month", "first"),
yr = ("yr", "first"),
doy = ("doy", "first"),
lat = ("lat", "imply"),
lon = ("lon", "imply"),
)
.reset_index()
)
Which means that each row of our dataset now accommodates all Santander bike journeys for every day and every H3 cell. We have now 172 cells noticed throughout 1,192 days.
We additionally filtered so that every cells that had at the very least one tube cease inside 500m have been included – that is neccessary to fulfill the Positivity Assumption. This assumption states that each unit has to have a non zero chance of each therapy and management. If a cell has no tube stops inside 500m (we are able to moderately assume {that a} commuter who can’t use the tube due to strikes would stroll 500m to make use of a Santander bike).
cell_day = cell_day[cell_day["n_tube_within_500m"] >= 1].copy()
This provides us a cell-day dataset with 62 H3 cells, 66,039 rows and 98.4% of cells ever handled.
Subsequent we are able to outline our consequence and therapy variables. As every cell can have differing ranges of anticipated bike utilization, we create our consequence variable to be relative to every cell’s capability – the entire journeys for every cell on every day divided by the variety of bike stations in that cell. we take the log in order that our coefficient tells us about proportional modifications quite than absolute ones and in order that the statistical assumptions of the regression are happy, and we add one in order that quiet cell-days with zero recorded journeys are included within the evaluation quite than silently dropped.
[
Y_{i,t} = logleft(1 + frac{text{Total Bike Trips in cell } i text{ on day } t}{text{Number of Bike Stations in cell } i}right)
]
We are able to calculate the result variable in python with the next code.
cell_day["y_per_station_log1p"] = np.log1p(cell_day["total_trips"] / cell_day["n_stations"])
Defining the therapy variable for strike publicity isn’t as easy. We all know which tube strains have been placing on every day – however this data doesn’t neatly map to every cell, as every tube line snakes throughout London. After we are pondering the query of what occurs to bike utilization when tube strains are usually not operational, it’s useful to first determine when bike stations are “close to” to tube stations which might be being effected by strikes. We have now outlined a motorcycle station to be affected by a strike whether it is inside 400m of a tube station that serves one of many placing strains.
We then outline a h3 cell to be strike affected if any bike station is strike affected inside that h3 cell. That is now our therapy variable.
[
T_{i,t} =
begin{cases}
1, & text{if cell } i text{ is strike-exposed on day } t
0, & text{otherwise}
end{cases}
]
To assemble this therapy variable for our dataset, we first need to create a strike effected column for our station degree knowledge. We do that utilizing the next operate which takes in our station-hour knowledge, a dataframe which tells us which strains have been placing on every day and a dataframe which tells us stations are related to every placing line.
def attach_strikes_to_base(
base: pd.DataFrame,
strikes_daily: pd.DataFrame,
station_line_map: pd.DataFrame,
) -> pd.DataFrame:
"""
Connect a binary strike_exposed indicator to the station-hour panel.
A station-hour is handled (strike_exposed = 1) if any Underground line
serving that station is on strike on that day.
base should have columns: station_id, trips_start (datetime), ts (numeric journey depend).
"""
df = base.copy()
df["date"] = pd.to_datetime(df["trips_start"]).dt.flooring("D")
station_day_treat = (
strikes_daily
.merge(station_line_map[["station_id", "affected_line"]], on="affected_line", how="inside")
.drop_duplicates(subset=["station_id", "date"])
.assign(strike_exposed=1)
[["station_id", "date", "strike_exposed"]]
)
df = df.merge(station_day_treat, on=["station_id", "date"], how="left")
df["strike_exposed"] = df["strike_exposed"].fillna(0).astype(int)
return df.drop(columns=["date"])
After we mixture the station-hour dataframe to cell-day degree we take the imply of strike_exposed column into a brand new column frac_exposed, and any cells with a constructive frac_exposed change into handled cells.
cell_day["treated"] = (cell_day["frac_exposed"] > 0).astype(int)
Extra element on the info wrangling will be discovered on https://github.com/stucsk99/tfl_bike_casual/blob/main/01_data_pipeline.ipynb
Now we’ve outlined our consequence and therapy variables, let’s take a step again and speak in regards to the underlying causal idea that underpins all the outcomes that we’ll arrive at on this article.
What’s the query we need to ask?
The causal mechanism underlying our evaluation is substitution. When a tube line strikes, commuters who would usually journey underground are displaced and should discover an alternate. We argue that for commuters close to main interchange stations, Santander Bikes symbolize essentially the most accessible different: they’re out there with out pre-registration, priced for brief journeys, and bodily current on the stations the place displaced commuters emerge. This substitution story is what connects our therapy variable, to our consequence by means of a reputable causal pathway quite than mere correlation.
Strike happens → tube commuters can not journey → these commuters search for alternate options → some stroll to a close-by Santander dock → bike journeys improve. Every arrow in that chain is a step within the mechanism. With out it, even a statistically vital result’s only a correlation with a narrative connected. With it, you’ve gotten a motive to imagine the impact is actual.
The causal mechanism we’re describing will be described by the next structural causal mannequin.

As a result of strike timing is set by labour negotiations quite than by something associated to biking demand, we’ve got good motive to imagine that strike days are usually not systematically completely different from non-strike days in ways in which would independently have an effect on bike utilization. A strike known as on a Tuesday in January just isn’t known as as a result of January Tuesdays are unusually good or unhealthy for biking – it’s known as as a result of a wage negotiation broke down. This makes the counterfactual comparability credible: the bike utilization we observe on comparable non-strike days is an affordable approximation of what would have occurred on strike days had the strike not occurred.
Now that we’ve got our causal mechanism said, we are able to keep on with our causal evaluation. However earlier than we try this, let’s undergo a number of the essential constructing blocks of causal inference – the potential outcomes framework.
Potential Outcomes
The basic drawback of causal inference is that we don’t observe the counterfactual outcomes – we by no means know what would have occurred to bike utilization on a strike day, if that strike had not occurred. That is by definition unobservable.
In a really perfect world, we’d observe each potential outcomes for every unit: which is the potential consequence if cell had not skilled a strike on day , and which is the potential consequence if it did expertise a stike. From right here we are able to outline the person therapy impact for cell on day which is the distinction between the 2 potential outcomes:
[
tau_{i,t} = Y_{i,t}(1) – Y_{i,t}(0)
]
We might like to know this amount for every remark, however as talked about above, we solely ever observe one of many two potential outcomes. The logical subsequent step is to common this impact for over all models. That is the Common Therapy Impact (ATE):
[
ATE = E[Y_{i,t}(1) – Y_{i,t}(0)] = E[tau_{i,t}]
]
That is the anticipated therapy impact for a randomly chosen unit from the complete. In our setting, it solutions: for a randomly chosen cell-day in our panel, what’s the anticipated change in log bike journeys per station if that cell-day have been to change into strike-exposed?
We are able to additionally outline one other therapy impact: The Common Therapy Impact on the Handled (ATT):
[
ATT = E[Y_{i,t}(1) – Y_{i,t}(0) | D_i = 1] = E[tau_{i,t} | D_i = 1]
]
The place is the therapy indicator. This shifts focus onto models that have been truly handled. for a cell-day that was truly strike-exposed, what was the causal impact of that publicity?
Naive Therapy Impact
Earlier than we get into how we estimate these figures utilizing sturdy causal strategies, we are able to first illustrate what goes unsuitable once we estimate the ATE naively. To do that as merely as doable, we may estimate the ATE to be distinction in pattern means between the handled and management observations. That’s,
[
tau^{naive} = overline{Y}_{D=1} – overline{Y}_{D=0}
]
print(f"Naive diff : {np.expm1(cell_day.loc[cell_day['treated']==1,'y_per_station_log1p'].imply() - cell_day.loc[cell_day['treated']==0,'y_per_station_log1p'].imply())*100:+.1f}%")
In our knowledge, this offers a naive distinction of +5.5%. Cells with any strike publicity have considerably greater log bike journeys per station than cells with out. However this isn’t a reputable causal estimate. We are able to decompose the naive distinction algebraically to see precisely what it’s estimating:
[
overline{Y}_{D=1} – overline{Y}_{D=0} = underbrace{E[Y_{i,t}(1) – Y_{i,t}(0) | D_i = 1]}_{ATT} + underbrace{E[Y_{i,t}(0) | D_i = 1] – E[Y_{i,t}(0) | D_i = 0] }_{textual content{choice bias}}
]
The primary time period is the ATT, what we wish. The second time period is choice bias – the distinction in management potential outcomes between handled and untreated models. In our case, this bias is probably going constructive: cells which might be strike-exposed are close to tube strains, which implies they’re in denser, extra central areas of London which have greater baseline bike utilization no matter any strike. The naive estimate conflates the impact of strikes with the pre-existing benefit of centrally situated cells.
Eliminating this choice bias is all the job of the strategies that observe.
Panel Knowledge
Our dataset has a construction that’s notably well-suited to addressing choice bias. It’s a panel. A panel dataset observes the identical models repeatedly over time. Our particular panel has the next construction
[
{ X_{i,t}, D_{i,t}, Y_{i,t} }
]
The place represents our H3 cells and and represents our days noticed over our dataset. (get precise worth of T and N right here) We have now N x T of whole observsations.
The important thing perception that panel knowledge gives is that this: if we observe the identical cell on a number of days, we are able to separate the time-invariant element of that cell’s consequence from the day-specific variation. A cell close to Financial institution station is at all times going to be busier than a cell close to Pimlico – that may be a everlasting characteristic of the cell’s location, not one thing that modifications with strikes. Panel strategies allow us to account for this everlasting characteristic with out ever having to measure it instantly.
We are able to use the inherent arrange of the panel knowledge to mannequin the therapy impact utilizing a two approach fastened results mannequin. This can be a generalisation of a standard Distinction in Variations methodology. This mannequin is about up within the following approach:
[
Y_{i,t} = alpha_{i} + lambda_{t} + tau{D}_{i,t} + beta X_{i,t} + epsilon_{i,t}
]
The place is our consequence variable for cell on day , is the fastened impact for cell , is the fastened impact for day , is the causal therapy impact, is the therapy indicator, are the coefficents for covariates and are our errors.
On this mannequin, we’ve got two fastened results, and for every cell and every day , which act as dummy variables for every cell and day. The cell fastened impact accommodates all time invariant cell traits (all of the geographical options of cell that don’t change over time) and the date fastened impact accommodates all cell invariant variation (day particular variation). That is equal to demeaning inside every cell and inside every date, which removes all time invariant cell traits and customary day-level shocks.
We are able to merely run this regression evaluation utilizing the ols operate from the statsmodels.system.api library:
twfe = smf.ols(
"""y_per_station_log1p ~ handled
+ temperature_2m + precipitation
+ is_weekend + is_bank_holiday + is_school_holiday
+ days_to_next_strike + days_since_last_strike
+ C(h3_cell) + C(date_str)""",
knowledge=cell_day,
).match(
cov_type="cluster",
cov_kwds={"teams": cell_day["h3_cell"]},
)
Notice how we are able to’t run strange OLS because the observations from the cell throughout completely different days are correlated. If we ignored this correlation and used commonplace OLS commonplace errors, we’d systematically understate the uncertainty in , producing confidence intervals which might be too slender and p-values which might be too small. We are able to tackle this through the use of the usual resolution of clustering errors on the cell degree. This enables for arbitrary correlation between the residuals and for a similar cell i throughout any two dates and , whereas sustaining the idea of independence throughout cells.
Outcomes
Our TWFE methodology offers us a rise of three.95% in Santander bike utilization on strike days, with a p-value of 0.097.
Earlier than we dive deeper into these outcomes, we first concentrate on some modifications we made to our knowledge to tighten the causal mechanisms that we need to perceive.
Having established that each cell in our evaluation should have at the very least one tube station inside 500 metres – our positivity situation – we apply a stronger restriction motivated by the causal mechanism itself. Not all tube stations generate equal commuter displacement once they strike. The 42 stations we concentrate on are the most important interchange stations of central London: Financial institution, Liverpool Avenue, King’s Cross, Waterloo, Victoria, and their neighbours. These are the stations the place hundreds of commuters converge every morning, the place Santander Bike docks are densest, and the place the substitution from tube to bike is most frictionless – a displaced commuter walks out of a closed station and finds a rack of bikes inside metres.
At extra peripheral stations, even the place a Santander dock exists close by, the displacement mechanism is weaker. Fewer commuters are purely tube-dependent, and the strolling distance to a motorcycle dock is extra prone to exceed what a time-pressured commuter will tolerate. Limiting to the 32 cells inside 800 metres of those 42 main interchange stations is subsequently a deliberate concentrate on the geographic inhabitants the place each the demand shock from the strike and the availability response from the bike community are sufficiently concentrated for the substitution impact to be detectable.
# Get centroids of all distinctive cells in cell_day_clean
unique_cells = cell_day["h3_cell"].distinctive()
cell_centroids = pd.DataFrame([
{"h3_cell": c,
"lat": h3.cell_to_latlng(c)[0],
"lon": h3.cell_to_latlng(c)[1]}
for c in unique_cells
])
# Construct KD-tree over the 42 station coordinates
station_coords = np.radians(CENTRAL_42[["lat", "lon"]].values)
tree = cKDTree(station_coords)
# Question every cell centroid
cell_coords = np.radians(cell_centroids[["lat", "lon"]].values)
radius_rad = 0.8 / 6371.0 # 800m in radians
# For every cell, discover distance to nearest of the 42 stations
nearest_dist_rad, _ = tree.question(cell_coords, okay=1)
cell_centroids["dist_to_central_42_km"] = nearest_dist_rad * 6371.0
cell_centroids["near_central_42"] = nearest_dist_rad <= radius_rad
central_cells = set(
cell_centroids.loc[cell_centroids["near_central_42"], "h3_cell"]
)
# ── Filter ─────────────────────────────────────────────────────
cell_day_central = cell_day_clean[
cell_day["h3_cell"].isin(central_cells)
].copy()
Days 300 days away from any strike have very completely different seasonal traits from strike days, and don’t have any causal relevance to the comparability. Together with them forces the date fastened results to span a large seasonal vary, and the cell fastened results are estimated from a interval that’s not instantly related to the comparability. By proscribing to an area window of 45 days round every strike date we are able to create a cleaner experiment: the management days look extra just like the counterfactual for the handled days, and seasonal confounding is decreased.
sub = cell_day_central[cell_day_central["days_to_nearest"] <= 45].copy()
We now have 4 completely different variations of basefile, every with an more and more highly effective sign to noise ratio.
| Basefile Model | Rows | Therapy % |
|------------------------------------------:|:-----------|-------------:|
| Solely cells inside 500m of tube cease | 66,039 | 0.82 |
| Solely cells near Central Stations | 34,590 | 0.94 |
| Solely days inside 45 days of strike days | 16,799 | 1.95 |
The plot reveals the completely different TWFE estimates throughout the completely different basefile specs. With essentially the most causally highly effective arrange of our panel knowledge achieves an estimated therapy impact of three.95% with a p-value of 0.097.

Our p-value above the usual p=0.05 that’s used as commonplace. Which means that our outcomes of a 3.95% improve could be achieved randomly 9.7% of the time. Though our p-value is beneath the standardly used benchmark, we are able to see that our three estimates are constantly constructive, and the width of the arrogance interval displays the restricted variety of strike occasions within the FOI knowledge, not the absence of an impact.
Causal Inference Assumptions
Earlier than getting too carried away with these outcomes, we’ve got to cease and take into account the assumptions that need to be made for the TWFE estimate to have an informal interpretation.
Positivity/Overlap requires that each unit has to have a non zero probability of being handled. We have now addressed this by ensuring each cell within the panel should have at the very least one tube cease inside 500m.
Parallel traits requires that within the absence of strikes, handled and management cells would have skilled the identical time pattern in bike utilization. That is believable in our setting as a result of strike timing is set by labour negotiation dynamics — the choice to strike on a selected date is pushed by bargaining outcomes between TfL administration and unions, not by something associated to the underlying trajectory of motorcycle utilization.
No anticipation requires that cells don’t change their behaviour earlier than therapy happens — that the announcement of a strike doesn’t itself alter bike utilization within the days earlier than the strike. That is partially addressed by the inclusion of days_to_next_strike as a covariate within the managed specification, which captures any systematic pre-strike pattern. We notice that for actually unannounced strikes, the no-anticipation assumption is mechanically happy.
SUTVA (Secure Unit Therapy Worth Assumption, Rubin 1980) requires that the potential outcomes of 1 cell don’t depend upon the therapy standing of different cells. That is the idea most definitely to be violated in our setting: a strike displaces commuters throughout a large geographic space, probably affecting bike utilization at cells past these instantly adjoining to placing strains. SUTVA violations will attenuate our estimate towards zero, which means our +3.95% needs to be interpreted as a decrease sure on the true impact for essentially the most instantly uncovered cells.
Closing Remarks
This text got down to reply a easy query: do London tube strikes push commuters onto Santander Bikes? The reply, primarily based on a two-way fastened results evaluation of 4 years of TfL open knowledge, is sure, however arriving at that reply was significantly much less simple than the clear consequence would possibly recommend.
Working with actual life knowledge isn’t simple. To get the journey knowledge right into a format which was usable for me to reply the questions Whereas parsing 144 weekly CSVs, I needed to reconcile inconsistent column schemas throughout knowledge releases, right a silent naming mismatch between strike line identifiers, and rebuilt the spatial mapping between bike stations and tube stops a number of instances.
This was all earlier than contemplating the distinction causal assumptions crucial to construct a reputable argument. Coming from an ML background, I additionally spent a non-trivial period of time investigating meta-learners (S, T, and X learners, that are a set of predictive machine studying strategies to estimate therapy results) for this drawback. This could have given us richer perception – the conditional common therapy have an effect on, or CATE, which might inform us how the therapy impact varies throughout London.
I realized the laborious the way in which that the software didn’t match the issue. Panel knowledge with recurring binary therapy and a robust geographic identification story needs a hard and fast results regression, not a cross-sectional ML estimator.

