Statue could be present in Weimar – Park an der Ilm (however Shakespeare clearly doesn’t communicate)
Welcome to the ninth publish in my “Plotly with code” collection! For those who missed the primary one, you possibly can test it out within the hyperlink beneath, or flick through my “one post to rule them all” to comply with together with your complete collection or different matters I’ve beforehand written about.
Awesome Plotly with Code Series (Part 1): Alternatives to Bar Charts
A brief abstract on why I’m penning this collection
My go-to software for creating visualisations is Plotly. It’s extremely intuitive, from layering traces to including interactivity. Nonetheless, while Plotly excels at performance, it doesn’t include a “knowledge journalism” template that provides polished charts proper out of the field.
That’s the place this collection is available in – I’ll be sharing easy methods to remodel Plotly’s charts into glossy, professional-grade charts that meet knowledge journalism requirements.
PS: All photos are authored on my own except in any other case specified.
Intro – Clustered columns cluster your mind
What number of occasions have you ever used a number of colors to symbolize a number of classes in a bar chart? I guess that fairly a couple of…
These a number of colors blended with a number of classes really feel like you might be clustering bars collectively. Clustering doesn’t look like an inviting phrase when you find yourself speaking insights. Certain, clustering is beneficial when you find yourself analysing patterns, however once you talk what you discovered from these patterns, it is best to in all probability be seeking to take away, clear and declutter (decluttering is my golden rule after having learn Cole Nussbaumer Storytelling with Data book).
In Awesome Plotly with code series (Part 4): Grouping bars vs multi-coloured bars, we already lined a situation the place utilizing colors to symbolize a third dimension made it fairly tough for a reader to know. The distinction we will likely be overlaying on this weblog is when the cardinality of those classes explode. For instance, within the Half 4 weblog, we represented international locations with continents, that are very easy to mentally map. Nonetheless, what occurs if we attempt to symbolize meals classes with international locations?
Now, that could be a completely different downside.
What’s going to we cowl on this weblog?
- Situation 1. To subplot or to stack bars? That’s the query.
- Situation 2. How on earth to plot 5 international locations towards 7 sorts of meals?
- Situation 3. Clustered charts fail to convey change over 2 interval for two teams.
PS: As at all times, code and hyperlinks to my GitHub repository will likely be offered alongside the way in which. Let’s get began!
Situation 1: Visualisation strategies… To subplot or to stack bars? That’s the query.
Picture you’re a marketing consultant presenting at a convention about how workforce is distributed in every nation. You collected the required knowledge, which could seem like the screenshot beneath.
You need the chart to point out what’s the share of every sector by nation. You don’t suppose an excessive amount of in regards to the chart design, and use the default output from plotly.specific
…
The place do I believe this plot has points?
The very first thing to say is how poor each this chart is at telling an attention-grabbing story. Some clear points are:
- You must use a key, which slows down understanding. Backwards and forwards we go, between bar and key
- You don’t an excessive amount of house for knowledge labels. I’ve tried including them, however the bars are too slim, and the labels are rotated. So both you’re the exorcist little one otherwise you maintain making an attempt to decipher a worth based mostly on the y-axis or the offered gridlines.
- There are simply too many bars, in no specific order. You possibly can’t rank clustered bars in the identical means as you possibly can rank bars displaying a single variable. Do you rank by the worth of a selected “sector” class? Do you order alphabetically by the x-axis or the legend classes?
- Evaluating the highest of the bars is almost unattainable. Say that you just need to examine if Vietnam has extra workforce in development that Spain… was it a enjoyable train to search for the nation, then work out that development is the pink bar and someway look throughout the chart? Nope. I imply, if I hadn’t added the labels (even when rotated), you’ll have in all probability not been in a position to inform the distinction.
Let’s see if there are higher alternate options to the chart above.
Different # 1. Utilizing subplots.
If the story you need to inform is one the place the main focus is on evaluating which international locations have the best share per sector, then I might advocate separating classes into subplots. In, Awesome Plotly with code series (Part 8): How to balance dominant bar chart categories, we already launched the usage of subplots. The situation was fully completely different, but it surely nonetheless confirmed how efficient subplots could be.
Tweaking the chart above, may render the next.
Why do I believe this plot is best than the clustered bar chart?
- The horizontal bar charts now permit for a extra legible rendering of the information labels.
- There is no such thing as a want of color coding. Because of the subplot titles, one can simply perceive which class is being displayed.
A phrase of warning
- Separating by subplots signifies that you need to choose easy methods to order the international locations within the y-axis. That is performed by selecting a selected class. On this case, I selected “agriculture”, which signifies that the opposite 3 classes to don’t preserve their pure order, making comparisons tough.
- The magnitude of the bars could (or could not) be saved throughout all subplots. On this case, we didn’t normalise the magnitudes. What I imply is that the vary of values – from min to max – is set for every particular person subplot. The consequence is that you’ve got bars of worth 12 (see “development” for India) rendered a lot bigger than a bar of worth 30 (see “providers” for India).
However, even with it’s flaws, the subplot chart is far more legible than the primary clustered bar chart.
Different #2. Utilizing a stacked bar chart.
Now, say that what you need to convey is how skewed (or not) the distribution of the workforce by nation is. As we noticed within the subplot various, this can be a bit tough, as every bar chart is rendered otherwise by every subplot. The subplot various was nice to reply that “India has the biggest % of their workforce devoted to development, with Nigeria having the smallest”, however it’s far more tough to reply that “development and providers symbolize 77% of India’s workforce”.
Verify the stacked bar beneath and determine which one do you favor.
Why do I believe this plot is best than the clustered bar chart?
- Stacked bar charts assist pin a narrative for every of the rows within the y-axis. Now, even what was comparatively straightforward to know within the clustered bar chart, makes it far more straightforward to know within the stacked one.
- The horizontal bar charts now permit for a extra legible rendering of the information labels.
- As a result of you might be coping with percentages, stacked bar charts can actually convey that numbers add as much as 100%.
- Lastly, all bars have the right magnitudes.
A phrase of warning
- Equally to the subplot various, separating by subplots signifies that you need to choose easy methods to order the international locations within the y-axis.
- A color coded legend is required, so some additional processing time is required.
Once more, regardless of it’s points, I hope you’ll agree with me that the stacked bar chart is far more legible than the primary clustered bar chart.
Tips about easy methods to create these 2 plots
Making a subplot chart
- 1st, you’ll in fact have to create a subplot object
fig = make_subplots(
rows=1, cols=4,
shared_yaxes=True,
subplot_titles=list_of_categories,
)
- 2nd, merely loop via every class and plot every bar chart on the precise “column hint”
fig.add_trace(
go.Bar(
y=aux_df['country'],
x=aux_df['percentage'],
marker=dict(coloration='darkblue'),
textual content=aux_df['percentage'].spherical(0),
textposition='auto',
orientation='h',
showlegend=False,
),
row=1, col=i
)
Making a stacked bar chart
- 1st, determine the order of the primary class
df = df.sort_values(['sector', 'percentage'], ascending=[True, True])
- 2nd, loop via every class. Extract the knowledge by nation and add a hint.
for sector in df['sector'].distinctive():
aux_df = df[df['sector'] == sector].copy()
fig.add_trace(
go.Bar(
x=aux_df['percentage'],
y=aux_df['country'],
orientation='h',
title=sector,
textual content=aux_df['percentage'],
textposition='auto',
)
)
- third, it is advisable inform plotly that this can be a stacked bar chart. You are able to do this within the
update_layout
methodology.
fig.update_layout(barmode='stack')
As Shakespeare would have stated had he labored as a knowledge analyst: to subplot or to stack?
Situation 2: How on earth to plot 5 international locations towards 7 sorts of meals?
On this second situation, you might be working with multi-category knowledge that represents how a lot sort of meals is exported by every nation as a share of its complete manufacturing. Within the dataset beneath, you should have details about 5 international locations and seven sorts of meals. How would you convey this info?
The default output which is doomed to fail
You check out what would the default output from plotly.specific
present. And what you see is just not one thing you want.
Possibility 1. Put the international locations within the x-axis with the meals classes within the legend
Possibility 2. Put the meals classes within the x-axis and international locations within the legend
You’ll convene with me that neither chart can be utilized to inform a transparent story. Let’s see what occurs if we use a stacked bar chart as above.
Different # 1. The stacked bar chart (which on this case, fails)
The stacked bar chart served us effectively within the earlier situation. Can it additionally assist us right here, the place we now have extra classes and the place the sum of the odds is just not equal to 100%?
Verify the bar charts beneath:
Possibility 1. Put the international locations within the x-axis with the meals classes within the legend
Possibility 2. Put the meals classes within the x-axis and international locations within the legend
Each stacked charts actually fail to simplify what we’re wanting it. In reality, I might argue they’re as tough to learn because the clustered bar chart. So, on this case, stacked bar charts have truly failed us.
Different #2. The dot plot (which is a flowery scatter plot)
This various I’m about to current is impressed within the subplot thought we utilized in situation 1. Nonetheless, on this case, I’ll change the bars for dots.
One factor I didn’t like from situation 1, was that the magnitude of bars didn’t make sense throughout the completely different classes. Every subplot had it’s personal x-axis vary.
Now, what do you consider this dot plot strategy for clearer knowledge storytelling?
Why do I believe this plot is best?
- Dot plot magnitudes are saved fixed throughout the board.
- On condition that I’ve a rustic (Netherlands) which surpasses the remainder, I really feel the dots convey this superiority higher – much more after I color them otherwise.
- Having these subplots organized as a desk, makes factor look aligned and neat. In different phrases, it’s straightforward to scan for solutions on the nation degree or on the meals class degree.
- No color coding required! And we are able to use emojis!
Tips about easy methods to create this plot
- 1st, create a subplots object. I’ve outlined the titles with emojis utilizing a dictionary
list_of_categories = df['Food'].distinctive().tolist()
list_of_categories.type()
food_to_emoji = {
'Cucumbers': '🥒',
'Eggs': '🥚',
'Mushrooms': '🍄 ',
'Onions': '🧅',
'Peppers': '🌶 ️',
'Potatoes': '🥔',
'Tomatoes': '🍅 '
}
subplot_titles = [f"{category} {food_to_emoji.get(category, '')}" for category in list_of_categories]
fig = make_subplots(rows=1, cols=7, shared_yaxes=True,
subplot_titles=subplot_titles
)
- 2nd, easy methods to add 1 single knowledge level for every mixture of {nation}-{meals}? Loop via the meals classes, however within the x-axis pressure plotting a dummy worth (I used the #1)
for i, characteristic in enumerate(list_of_categories):
c = i + 1
if c == 1:
aux_df = df[df['Food'] == characteristic].sort_values('share', ascending=False).copy()
else:
aux_df = df[df['Food'] == characteristic].copy()
fig.add_trace(
go.Scatter(
y=aux_df['Country'],
x=[1] * len(aux_df), # <---- compelled x-axis
mode='markers+textual content',
textual content=textual content,
textposition=textposition,
textfont=textfont,
marker=marker,
showlegend=False,
),
row=1, col=c
)
- third, however when you plot the worth 1, how do you present the true meals % values? Simple, you outline these within the
textual content
,textposition
,textfont
andmarker
parameters.
textual content = [f"{val}%" for val in aux_df['percentage'].spherical(0)]
textposition = ['top center' if val < 10 else 'middle center' for val in aux_df['percentage']]
textfont = dict(coloration=['grey' if val < 10 else 'white' for val in aux_df['percentage']])
marker = dict(measurement=aux_df['percentage'] * 3,
coloration=['rgb(0, 61, 165)' if country == 'Netherlands 🇳🇱 ' else 'darkgrey' for country in aux_df['Country']])
Situation 3: Clustered charts fail to convey change over 2 teams.
In each situations above, we have been coping with a number of classes and noticed how clustered bar charts hinder our capacity to rapidly perceive the message we are attempting to convey. On this final situation, we cowl the case the place you solely have 2 classes. On this case, 2 completely different intervals in time (it could possibly be 2 segments, 2 areas, and so on)
As a result of the cardinality is so small (solely 2), I’ve seen many individuals nonetheless utilizing stacked bar charts. Verify the information beneath. It represents the rating that completely different Spanish soccer groups have held in 2 completely different seasons.
For those who plotted the groups within the x-axis, the rating within the y-axis and the season as a colored legend, we might have the next plot.
The place do I believe this plot has points?
- Rankings will not be effectively represented with bar charts. For instance, right here a bigger rating is worse (ie, rank = 1 is a lot better than rank = 50)
- It isn’t straightforward to check rankings for the season 2023–2024. It is because we now have sorted the chart in ascending order based mostly on the 2013–2014 season.
- There are groups which had a UEFA rating in season 2013–2014, however didn’t in 2023–2024 (Malaga). This isn’t instantly obvious from the chart.
The slope graph various
I at all times transfer to slope graphs after I have to visualise rank comparability or any story of change (ie, this datapoint has travelled from right here to right here). Change doesn’t need to be over time (though it’s the commonest sort of change). A slope graph could possibly be used to check 2 situations, 2 reverse views, 2 geographies, and so on. I actually like them as a result of your eye can simply journey from begin level to finish level with out interruption. As well as, it makes the diploma of change far more apparent. Verify the chart beneath… is the story of how Valencia CF fully destroyed it’s rating for the reason that arrival of a brand new proprietor.
Tips about easy methods to create this plot
- 1st, loop via every membership and plot a Scatter plot.
for membership in df['club'].distinctive():
club_data = df[df['club'] == membership]
# DEFINITION OF COLOUR PARAMETERS
...
fig.add_trace(go.Scatter(
x=club_data['season'],
y=club_data['ranking'],
mode='traces+markers+textual content',
title=membership,
textual content=club_data['text_column'],
textposition=['middle left' if season == '2013-14' else 'middle right' for season in club_data['season']],
textfont=dict(coloration=colour_),
marker=dict(coloration=coloration, measurement=marker_size),
line=dict(coloration=coloration, width=line_width)
))
- 2nd, outline the
textual content
,textfont
,marker
andline
parameters.
for membership in df['club'].distinctive():
club_data = df[df['club'] == membership]
if membership == 'Valencia':
coloration = 'orange'
line_width = 4
marker_size = 8
colour_ = coloration
else:
coloration = 'lightgrey'
line_width = 2
marker_size = 6
colour_ = 'gray'
# go.Scatter()
...
- third, as a result of we’re coping with “rankings”, you possibly can set the
yaxis_autorange='reversed'
fig.update_layout(
...
yaxis_autorange='reversed', # Rankings are often higher when decrease
)
Abstract of multi-category visualization approaches
On this publish, we explored easy methods to transfer past clustered bar charts through the use of more practical visualisation strategies. Right here’s a fast recap of the important thing takeaways:
Situation 1: subplots vs. stacked bars
- Subplots: Finest for category-specific comparisons, with clear labels and no want for colour-coded legends.
- Stacked Bars: Ultimate for displaying cumulative distributions, with constant bar magnitudes and intuitive 100% totals.
Situation 2: dot plot for top cardinality
- When coping with a number of classes each within the x and y axis, dot plots supply a cleaner view.
- Not like subplots or stacked bars, dot plots maintain magnitudes fixed and comparisons clear.
Situation 3: slope graphs for two-point comparisons
- For monitoring adjustments between two factors, slope graphs clearly present motion and route.
- They spotlight upward, downward, or secure tendencies in a single look.
The place can you discover the code?
In my repo and the dwell Streamlit app:
Acknowledgements
Additional studying
Thanks for studying the article! In case you are occupied with extra of my written content material, right here is an article capturing all of my different blogs posts organised by themes: Data Science group and venture administration, Information storytelling, Advertising and marketing & bidding science and Machine Studying & modelling.
Keep tuned!
If you wish to get notified after I launch new written content material, be happy to comply with me on Medium or subscribe to my Substack publication. As well as, I would be very happy to chat on Linkedin!
Initially printed at https://joseparreogarcia.substack.com.