Reading List

The most recent articles from a list of feeds I subscribe to.

Style-observer: JS to observe CSS property changes, for reals

Style-observer social media teaser

I cannot count the number of times in my career I wished I could run JS in response to CSS property changes, regardless of what triggered them: media queries, user actions, or even other JS.

Use cases abound. Here are some of mine:

  • Implement higher level custom properties in components, where one custom property changes multiple others in nontrivial ways (e.g. a --variant: danger that sets 10 color tokens).
  • Polyfill missing CSS features
  • Change certain HTML attributes via CSS (hello --aria-expanded!)
  • Set CSS properties based on other CSS properties without having to mirror them as custom properties

The most recent time I needed this was to prototype an idea I had for Web Awesome, and I decided this was it: I’d either find a good, bulletproof solution, or I would build it myself.

Spoiler alert: Oops, I did it again

A Brief History of Style Observers

The quest for a JS style observer has been long and torturous. Many have tried to slay this particular dragon, each getting us a little bit closer.

The earliest attempts relied on polling, and thus were also prohibitively slow. Notable examples were ComputedStyleObserver by Keith Clark in 2018 and StyleObserver by PixelsCommander in 2019.

Jane Ori first asked “Can we do better than polling?” with her css-var-listener in 2019. It parsed the selectors of relevant CSS rules, and used a combination of observers and event listeners to detect changes to the matched elements.

Artem Godin was the first to try using transition events such as transitionstart to detect changes, with his css-variable-observer in 2020. In fact, for CSS properties that are animatable, such as color or font-size, using transition events is already enough. But what about the rest, especially custom properties which are probably the top use case?

In addition to pioneering transition events for this purpose, Artem also concocted a brilliant hack to detect changes to custom properties: he stuffed them into font-variation-settings, which is animatable regardless of whether the axes specified corresponded to any real axes in any actual variable font, and then listened to transitions on that property. It was brilliant, but also quite limited: it only supported observing changes to custom properties whose values were numbers (otherwise they would make font-variation-settings invalid).

The next breakthrough came four years later, when Bramus Van Damme pioneered a way to do it “properly”, using the (then) newly Baseline transition-behavior: allow-discrete after an idea by Jake Archibald. His @bramus/style-observer was the closest we’ve ever gotten to a “proper” general solution.

Releasing his work as open source was already a great service to the community, but he didn’t stop there. He stumbled on a ton of browser bugs, which he did an incredible job of documenting and then filing. His conclusion was:

Right now, the only cross-browser way to observe Custom Properties with @bramus/style-observer is to register the property with a syntax of “<custom-ident>”. Note that <custom-ident> values can not start with a number, so you can’t use this type to store numeric values.

Wait, what? That was still quite the limitation!

My brain started racing with ideas for how to improve on this. What if, instead of trying to work around all of these bugs at once, we detect them so we only have to work around the ones that are actually present?

World, meet style-observer

At first I considered just sending a bunch of PRs, but I wanted to iterate fast, and change too many things. I took the fact that the domain observe.style was available as a sign from the universe, and decided the time had come for me to take my own crack at this age-old problem, armed with the knowledge of those who came before me and with the help of my trusty apprentice Dmitry Sharabin (hiring him to work full-time on our open source projects is a whole separate blog post).

One of the core ways style-observer achieves better browser support is that it performs feature detection for many of the bugs Bramus identified. This way, code can work around them in a targeted way, rather than the same code having to tiptoe around all possible bugs. As a result, it basically works in every browser that supports transition-behavior: allow-discrete, i.e. 90% globally.

Keep Calm and Style Observe

  • Safari transition loop bug (#279012): StyleObserver detects this and works around it by debouncing.
  • Chrome unregistered transition bug (#360159391): StyleObserver detects this bug and works around it by registering the property, if unregistered.
  • Firefox no initial transitionstart bug (#1916214): By design, StyleObserver does not fire its callback immediately (i.e. works more like MutationObserver than like ResizeObserver). In browsers that do fire an initial transitionstart event, it is ignored.
  • In addition, while working on this, we found a couple more bugs.

Additionally, besides browser support, this supports throttling, aggregation, and plays more nicely with existing transitions.

Since this came out of a real need, to (potentially) ship in a real product, it has been exhaustively tested, and comes with a testsuite of > 150 unit tests (thanks to Dmitry’s hard work).

If you want to contribute, one area we could use help with is benchmarking.

That’s all for now! Try it out and let us know what you think!

Gotta end with a call to action, amirite?

Context Chips in Survey Design: “Okay, but how does it _feel_?”

Minimalistic skeleton diagram showing the concept presented in this article

One would think that we’ve more or less figured survey UI out by now. Multiple choice questions, checkbox questions, matrix questions, dropdown questions, freeform textfields, numerical scales, what more could one possibly need?!

And yet, every time Google sponsored me to lead one of the State Of … surveys, and especially the inaugural State of HTML 2023 Survey, I kept hitting the same wall; I kept feeling that the established options for answering UIs were woefully inadequate for balancing the collection good insights with minimal friction for end-users.

The State Of surveys used a completely custom survey infrastructure, so I could often (but not always) convince engineering to implement new question UIs. After joining Font Awesome, I somehow found myself leading yet another survey, despite swearing never to do this again. 🥲 Alas, building a custom survey UI was simply not an option in this case; I had to make do with the existing options out there [1], so I felt this kind of pain to my core once again.

So what are these cases where the existing answering UIs are inadequate, and how could better ones help? I’m hoping this case study to be Part 1 of a series around how survey UI innovations can help balance tradeoffs between user experience and data quality, though this is definitely the one I’m most proud of, as it was such a bumpy ride, but it was all worth it in the end.

The Problem

For context, the body of State Of surveys is a series of “Feature questions”, which present the respondent with a certain web platform feature and ask if they had heard of it or used it. Feature questions look like this:

alt text

An example of a feature question from the State of CSS 2022 survey.

Respondents get a score in the end, based on how many of these they had heard of or used. Each survey had dozens of these questions. Based on initial estimates, State of HTML was going to have at least fifty.

alt text

This was my score. We revamped the scoring system for this iteration and switched from a percentage to a point-based score, since not all questions were equally weighted.

Respondents love these questions. They learn about new things they may not have heard of, and get to test their knowledge. But also, from the survey designer’s perspective, they gamify a (very long) survey, increasing completion rates, and provide users incentive to share their score on social media, spreading the word.

One would expect that they also provide valuable data, yet browser vendors had repeatedly mentioned that this data was largely useless to them. Surveys were all about what people felt, not what they knew or had used — they had better ways to gauge those. Instead, the reason they funneled thousands into funding these surveys every year was the 1-2 pain points questions towards the end. That was it. Survey data on experience and awareness could be useful, but only if it was accompanied with subjective sentiment data: if they hadn’t used it or heard about it, were they interested? If they had used it, how did it feel?

Optional freeform comments had been added the year prior, but got an abysmally low response rate, and being entirely freeform, were hard to analyze.

As an attempt to address this feedback, a button that opened a freeform comment field had been introduced the year prior, but response rates were abysmally low, starting from 0.9% for the first question [2] and dropping further along the way. This was no surprise to me: freeform questions have a dramatically lower response rate than structured questions, and hidden controls get less interaction (“out of sight, out of mind”). But even if they had a high response rate, freeform comments are notoriously hard to analyze, especially when they are so domain specific.

Ideation

Essentially, the data we needed to collect was a combination of two variables: experience and sentiment. Collecting data on two variables is common in survey design, and typically implemented as a matrix question.

🤷 👍 👎
Never heard of it
Heard of it
Used it
If user experience and cognitive load were not a concern, the same data could actually be collected with a matrix question.

Indeed, if there were only a couple such questions, a matrix could have been acceptable. But …could you imagine filling out 50 of these?

An acceptable solution needed to add minimal friction for end-users: there were at least 50 such questions, so any increase in friction would quickly add up — even one extra click was pushing it. And we needed a sufficiently high response rate to have a good CI. But it also needed to facilitate quantitative data analysis. Oh, and all of that should involve minimal engineering effort, as the (tiny) engineering team was already stretched thin.

Did I hear anyone say overconstrained? 😅

Idea 1: Quick context

Initially, I took these constraints to heart. Misguided as it may have been, the comment field and the infrastructure around it already existed, so I designed a UI that revealed relevant positive/negative sentiment options using contextual progressive disclosure. These inserted predefined responses into the comment field with a single click.

Being a purely client-side interaction meant it could be implemented in a day, and it still kept end-user friction at bay: providing sentiment was optional and only required a single click.

In theory, quantitative data analysis was not optimally covered, as freeform responses are notoriously hard to analyze. However, based in the psychology of user behavior, I hypothesized that the vast majority of users would not edit these at all, a minority would append context, and an even tinier minority would actually edit the responses. This meant we could analyze them via simple string matching and only lose a few false negatives.

Mockup of the quick context idea.

I was very proud of myself: I had managed to design a solution that satisfied all constraints, a feat that initially seemed impossible! Not to mention this design gently guided users towards using the comment field, which could motivate them to add even more context.

Yet, when I presented my mocks to the team, engineering hated it with a passion. The lead engineer (who was also the project founder) found the idea of turning a structured interaction into unstructured data deeply unsettling. So much it motivated him to implement a whole backend to store these followups properly, something I had initially thought was out of the question.

So now what? Back to the drawing board, but with one constraint lifted!

Ideas 2 & 3: Followups and sentiment radios

Mockups of intermediate ideas. Left: Followups (by lead engineer) Right: Sentiment radios (by Google PM)

This new backend came with a UI proposal that raised red flags for both me and the Google PM I was collaborating with (one of the survey’s core stakeholders, but not the main one). Even seeing the followup UI required an extra click, so it was guaranteed to have a low response rate. It would have been better than the 0.9% of the comment field (clicking is easier than typing!), but still pretty low (I would estimate < 15%). And even when users were intrinsically motivated to leave feedback, two clicks and a popover was a steep price to pay.

Another idea came from the Google PM: sentiment radios. It was an attempt to simplify the interaction by framing it as a two step process: first experience, then sentiment, through radio buttons that slid down once a main answer was selected. However, I was very concerned that such a major UI shift after every single answer would quickly become overwhelming over the course of the survey.

Idea 4: Context chips

Back to the drawing board, I asked myself: if I had infinite engineering resources, what UI would I design? The biggest challenge was reducing friction. All ideas so far had required at least one extra (optional) click to select sentiment. Could we do better? What if users could select both experience and sentiment with a single click?

Guided by this, I designed a UI where selecting sentiment is done via “context chips” which are actually part of the answer, so clicking them also selects the answer they are accompanying, allowing users to express an answer across both variables with a single click, or just select the answer itself to express no sentiment. To reduce visual clutter, these only faded in on hover. Additionally, clicking on the selected chip a second time would deselect it, fixing a longstanding UX issue with radio buttons [3].

Over the course of designing this, I became so convinced it was the right solution, that I implemented a high fidelity prototype myself, complete with code that could be easily adapted to the infrastructure used by the survey app.

The context chips prototype on desktop and mobile.

There were so many things I loved about this design, even beyond the core idea of answering both variables with a single click. There were no layout shifts, the followups were in close proximity to the main answer, and the styling of the chips helped build a visual association to reduce friction even more as you go. I was not a huge fan of the mobile version, but I couldn’t think of a much better way to adapt this UI to mobile.

Early alternative concept that supported followups. This was deemed too complicated and was abandoned early on.

Reception of context chips was not what I had hoped at first. I had expected pushback due to the engineering effort needed, but folks also had other concerns: that users would find things appearing on hover distracting and feel “ambushed”, that the UI was too “weird”, that users would not discover the 1-click interaction and use it as a two-step process anyway, and that response rate would be low because these chips were not visible upfront.

Mini-feature questions: Context Chips + Checkboxes?

Around the same time as designing context chips, I had a relevant realization: we don’t actually need to know both awareness and usage for all features.

For old, widely supported features, awareness doesn’t matter, because even when it’s low, it has plateaued. And for features that are so new they have not yet been implemented in browsers, usage is largely meaningless. For these cases, each feature only has two states, and thus experience can be expressed with a checkbox! This would allow us to combine questions about multiple features in one, and we could still use context chips, albeit a little differently:

The mini features prototype on desktop and mobile.

While these could be used for questions that either discern usage or awareness, we decided to stick to the former, as there was a (valid) concern that having mini-feature questions whose checked state meant different things could be confusing and lead to errors. That way, only old, lower-priority features would be relegated to this template, and new features which tend to be higher priority for browser vendors would still get the full UI, comments and all. Instead, to improve the experience for cutting edge features, we introduced a “Not implemented” tag next to the “Used it” option.

One disadvantage of the mini feature UI is that due to the way context chips work, it is not possible to select sentiment for features you have not used: once you click on a chip, it also selects the feature, as if you had clicked on its label. I guess it could be possible to click on a chip and then uncheck the feature, but that would be a very weird interaction.

Idea 5: Existing 5-point question template

At this point, the lead engineer dredged up a question template that had been used in other surveys to ask about the respondent’s experience with various types of tooling. Instead of separating experience and sentiment, it used a 5-point scale where each answer except the first answered both questions.

The existing 5-point question template.

The eng lead was sold: zero engineering effort! The Google PM was also sold: 100% response rate! (since it was not possible to avoid expressing sentiment for features you had heard or used).

I had serious reservations.

  • There are arguments for even numbered Likert scales (no neutral option), but these always involve scales of at least 4 points. If you force people to select between two states, positive or negative, you’re simply going to get garbage data. Neutral votes get pushed into positive votes, and the data around positive sentiment becomes useless.
  • These did not allow users to express sentiment for features they had not heard of, despite these questions often including enough info for users to know whether they were interested.
  • I was worried that increasing the number of upfront answers to 5 would increase cognitive load too much — and even scrolling distance, by 66%!

A UX researcher we were working with even did a heuristic evaluation that somewhat favored the 5-point template mainly on the basis of being a more familiar UI. The odds seemed stacked against context chips, but the upcoming usability testing could still tip the scales in their favor.

Usability Testing to the Rescue!

Despite the lead engineer being unconvinced about the merits of context chips and being adamant that even adapting my fully functional prototype was too much work, since the prototype existed, we decided to user test it against the 5-point question and see how it compared.

We ran a within-subjects usability study with 6 participants (no, they are not too few) recruited via social media. Half of the survey used the 5-point template, the other half context chips. The order of the conditions was randomized to avoid order effects.

In addition to their actual experience, we also collected subjective feedback at the end of the survey, showing them a screenshot of each answering UI and asking how each felt.

What worked well: Context Chips

I have run many usability studies in the last ten years, and I have never seen results as resounding as this one. So much that we unanimously agreed to switch to context chips after the 5th participant, because the scales were so tipped in favor of context chips that nothing that happened in the last session could have tipped them the other way.

The lead engineer observed some of the sessions, and this was instrumental in changing his mind. This was not a coincidence: when engineering is unconvinced that a certain UI is worth the implementation complexity, it can be a good strategy to have them observe usability testing sessions. Not only does it help prove the value to them, it also builds long-term user empathy, which makes future consensus easier. Given the unfortunate lack of HCI prioritization in Computer Science curricula, this may even be their first exposure to usability testing.

All of my concerns about the 5-point template were brought up by participants on their own accord, repeatedly:

  • All participants really liked being able to express sentiment, and were vocal about their frustration when they could not express it.
  • All but one participant (4/5) complained about being forced into selecting a sentiment when they had no opinion.
  • Some participants even mentioned that the 5-point template felt overwhelming.

Furthermore, none of the concerns about context chips were validated:

  • No-one found the chips appearing on hover distracting or felt “ambushed”.
  • No-one struggled to understand how to use them.
  • Everyone discovered the 1-click interaction pretty fast (typically within the first 2-3 questions). But interestingly, they still chose to use it as a two-step process for some of the questions, presumably to reduce cognitive load for difficult to answer questions by breaking down the decision into two smaller ones. The fact that this UI allowed users to make their own efficiency vs cognitive load tradeoffs was an advantage I had not even considered when designing it!
  • Response rate was generally high — when people did not select sentiment, it was because they genuinely had no opinion, not because they couldn’t be bothered.

What worked okay: Mini-feature Questions

Mini-feature questions did successfully help cut down response time per feature by 75%, though this came at a cost: Once more, we saw that participants really wanted to express sentiment, and were frustrated when they couldn’t, which was the case for features they had not used. Regardless, we agreed that the tradeoff was worth it for the low-priority questions we were planning to use mini-features for.

What did not work: Context Chips on Mobile

A blind spot in our testing was that we did not test the UI on mobile. Usability tests were conducted remotely via video call, so it was a lot easier to get participants to use their regular computers. Additionally, stats for previous surveys showed that mobile use was a much smaller percentage in these surveys than for the web in general (~25%), so we did not prioritize it.

This was a mistake in itself: using current usage stats to inform prioritization may seem like a great idea, but it can be prone to reverse causality bias. Meaning, are people not using these surveys on mobile very much because they genuinely don’t need to, or because the experience on mobile is subpar? We often see this with accessibility too: people claim that it doesn’t matter because they don’t have users with disabilities, but often they don’t have users with disabilities because their site is not accessible!

But even if 75% of users genuinely preferred to take these surveys on desktop (which is plausible, since they are long and people often do them in increments), we should at least have done a few sessions on mobile — 25% is not negligible!

Once responses started coming in, we realized that participants had trouble understanding what the up and down arrows meant, since on mobile these are shown without labels until selected. This would have been an easy fix had it had been caught early, e.g. thumbs up/down icons could have been used instead. This was not a huge issue, as their purpose becomes clear when selected, but it definitely adds some friction.

Aftermath: Context Chips in the Wild

In our usability testing, we had seen a high response rate for sentiment (% of question respondents who selected sentiment), but that is no guarantee things will play out that way post-launch as well. When participants know they are being watched they are always more willing to engage and pay a lot more attention, no matter how much you emphasize that they should act naturally when briefing them. That’s not their failing; it’s simply human nature.

Indeed, sentiment response rates in the real-world were lower than those observed in the usability study, but still high — ranging from 24% to 59% and averaging 38% (with the same median) per question, meaning that out of every ten participants that answered each question, approximately four also provided a sentiment. This was more than enough to draw conclusions. In fact, context chips were deemed such a success they were later adopted by all other State Of surveys, even at the cost of continuity with previous years.

Against expectations, participants were just as likely to express sentiment for features they had never heard of, and in fact marginally more likely than for features they had simply heard of. In general response rates were pretty uniform across all experiences:

Experience Sentiment response rate (average) Sentiment response rate (median)
Never heard of it 37.3% 37.4%
Heard of it 37.3% 36.9%
Used it 39.0% 40.1%
Overall 37.6% 37.9%

Sentiment response rates overall and by experience (per question).

As we had observed in the user study as well, participants were far more likely to express positive rather than negative sentiment (possibly a case of acquiescence bias). Here are some interesting stats:

  • The feature with the most negative sentiment overall across all experiences (<model>) still only had 10% of respondents expressing negative sentiment for it, and still had 2.4x more positive sentiment than negative (24.7% vs 10.4%)!
  • In contrast, the feature with the most positive sentiment (<datalist>) got a whopping 55% of respondents expressing positive sentiment (and only 4% negative).
  • In fact, even the feature with the least positive sentiment (Imperative Slot Assignment 🤔) still had way more positive sentiment (17.24%) than <model> had negative sentiment (10%)!
  • If we look at the ratio of positive over negative sentiment, it ranged from 64x (!) more positive than negative sentiment (45% vs 0.7% for landmark elements) to a mere 2.4x times more positive sentiment (24.7% vs 10.4% for <model>), and was 11x on average (8x median).

The analysis presented in this section includes data from 20K respondents, which was most of the whole dataset (around 90%) but not all, as it was done before the survey closed.

Results Visualization

Presenting all this data was another challenge. When you have two variables, ideally you want to be able to group results by either. E.g. you may be more interested in the total negative sentiment for a feature, or how many people had used it, and the results visualization should support both. How do we design a results display that facilitates this? Thankfully the visualizations for State Of surveys were already very interactive so interactivity was not out of the question.

I was no longer involved by then but consulted in a volunteer capacity. My main advice was to use proximity for clear visual grouping, and to use a consistent visual association for bars that represented the same bit of data, both of which they followed. This was the rendering they settled on for the results:

Interactive bar chart presenting two variables at once.

I think in terms of functionality this works really well. The visual design could be improved to communicate IA better and appear less busy at first glance, but given there is no dedicated designer in the team, I think they did a fantastic job.

Generalizability

While this UI was originally designed to collect sentiment about the selected option in a multiple choice question, I think it could be generalized to improving UX for other types of two variable questions. Generally, it can be a good fit when we have questions that collect data across two variables and:

  1. The second variable is optional and lower priority than the first
  2. The first variable is exclusive (single) choice, i.e. not checkbox questions.

The core benefit of this approach is the reduction in cognitive load. It is well established that matrix questions are more overwhelming. This design allows questions to initially appear like a simple multiple choice question, and only reveal the UI for the second variable upon interaction. Additionally, while matrix questions force participants to decide on both variables at once, this design allows them to make their own tradeoff of cognitive load vs efficiency, treating the UI as single step or two step as they see fit.

Another benefit of this design is that it allows for option labels to be context-dependent for the second variable, whereas a matrix limits you to a single column header. In the sentiment case, the labels varied depending on whether the feature had been used (“Positive experience” / “Negative experience”) or not (“Interested” / “Not interested”).

The more such questions a survey has, the bigger the benefits — if it’s only about a couple questions, it may not be worth the implementation complexity, though I’m hoping that survey software may eventually provide this out of the box so that this is no longer a tradeoff.

That said, matrix questions do have their benefits, when used appropriately, i.e. when the two requirements listed above are not met.

Matrix questions have an big edge when users need to make a single selection per row, especially when this may be the same answer for multiple rows, which means they can just tick down a whole column. Context chips do not allow users to build this positional association, as they are not aligned vertically. THey are placed right after the answer text, and thus their horizontal position varies per answer. To mitigate this, it can be useful to color-code them and maintain the same color coding consistently throughout the survey so that participants can build a visual association [4].

Context chips also enforce a clear prioritization across the two variables: the question is presented as a multiple choice question across the first variable, with the chips allowing the respondent to provide optional additional context across a second variable. Matrix questions allow presenting the two variables with the same visual weight, which could be desirable in certain cases. For example, there are cases when we don’t want to allow the respondent to provide an answer to the first variable without also providing an answer to the second.

Lessons Learned

In addition to any generalizable knowledge around survey design, I think this is also an interesting product management case study, and teaches us several lessons.

Never skimp on articulating the north star UI

Start any product design task by ignoring ephemeral constraints (e.g. engineering resources) and first reach consensus on what the optimal UI is, before you start applying constraints. Yes, you read that right. I want to write a whole post about the importance of north star UIs, because this is one of many cases over the course of my career where tight implementation constraints were magically lifted, either due to a change of mind, a change in the environment, or simply someone’s brilliant idea. Without consensus on what the north star UI is (or even a clear idea about it) you then have to go back to the drawing board when this happens.

User testing is also a consensus-building tool

You probably already know that usability testing is a great tool for improving user experience, but there is a second, more strategic hidden utility to it: consensus building.

I’ve been in way, way too many teams where UI decisions were made by hypothesizing about user behavior, which could not only be missing the mark, but also there is no way forwards for disagreements. What do you do, hypothesize harder? When there is user testing data, it is much harder to argue against it.

This is especially useful in convincing engineering that a certain UI is worth the implementation complexity, and having engineers observe usability testing sessions can be an educational experience for many.

Heuristic evaluations are not a substitute for usability testing

There are many things to like about heuristic evaluations and design reviews. They can be done by a usability expert alone and can uncover numerous issues that would have taken multiple rounds of usability testing, especially if they are also a domain expert. Fixing the low hanging fruit issues means user testing time can be spent more efficiently, uncovering less obvious problems. That are are also certain types of issues that can only be uncovered by a heuristic evaluation, such as _“death by a thousand paper cuts” type issues: small issues that are not a big deal on their own and are too small to observe in a user study, but add up to more friction.

However, a big downside of these is that they are inherently prone to bias. They can be excellent for finding problems that may have been overlooked and are often obvious once pointed out. However (unless the number of evaluators is large), they are not a good way to decide between alternatives.


  1. Unlike Devographics, surveys are not FA’s core business, so the Impact/Effort tradeoff simply wasn’t there for a custom UI, at least at this point in time. I ended up going with Tally, mainly due to the flexibility of its conditional logic and its support for code injection (which among other things, allowed me to use FA icons — a whopping 120 different ones!). ↩︎

  2. Meaning out of the people who responded to that question about their experience with a feature, only 0.9% left a comment. ↩︎

  3. A radio group with all buttons off cannot be returned to that state by user interaction. ↩︎

  4. The ~9% of people with atypical color vision won’t benefit, but that’s okay in this case, as color is used to add an extra cue, and not an essential part of the interface. ↩︎

Web Components are not Framework Components — and That’s Okay

Disclaimer: This post expresses my opinions, which do not necessarily reflect consensus by the whole Web Components community.

A blog post by Ryan Carniato titled “Web Components Are Not the Future” has recently stirred a lot of controversy. A few other JS framework authors pitched in, expressing frustration and disillusionment around Web Components. Some Web Components folks wrote rebuttals, while others repeatedly tried to get to the bottom of the issues, so they could be addressed in the future.

When you are on the receiving end of such an onslaught, the initial reaction is to feel threatened and become defensive. However, these kinds of posts can often end up shaking things up and pushing a technology forwards in the end. I have some personal experience: after I published my 2020 post titled “The failed promise of Web Components” which also made the rounds at the time, I was approached by a bunch of folks (Justin Fagnani, Gray Norton, Kevin Schaaf) about teaming up to fix the issues I described. The result of these brainstorming sessions was the Web Components CG which now has a life of its own and has become a vibrant Web Components community that has helped move several specs of strategic importance forwards.

As someone who deeply cares about Web Components, my initial response was also to push back. I was reminded of how many times I have seen this pattern before. It is common for new web platform features to face pushback and resistance for many years; we tend to compare them to current userland practices, and their ergonomics often fare poorly at the start. Especially when there is no immediately apparent 80/20 solution, making things possible tends to precede making them easy.

Web platform features operate under a whole different set of requirements and constraints:

  • They need to last decades, not just until the next major release.
  • They need to not only cater to the current version of the web platform, but anticipate its future evolution and be compatible with it.
  • They need to be backwards compatible with the web as it was 20 years ago.
  • They need to be compatible with a slew of accessibility and internationalization needs that userland libraries often ignore at first.
  • They are developed in a distributed way, by people across many different organizations, with different needs and priorities.

Usually, the result is more robust, but takes a lot longer. That’s why I’ve often said that web standards are “product work on hard mode” — they include most components of regular product work (collecting user needs, designing ergonomic solutions, balancing impact over effort, leading without authority, etc.), but with the added constraints of a distributed, long-term, and compatibility-focused development process that would make most PMs pull their hair out in frustration and run screaming.

I’m old enough to remember this pattern playing out with CSS itself: huge pushback when it was introduced in the mid 90s. It was clunky for layout and had terrible browser support — “why fix something that wasn’t broken?” folks cried. Embarrassingly, I was one of the last holdouts: I liked CSS for styling, but was among the last to switch to floats for layout — tables were just so much more ergonomic! The majority resistance lasted until the mid '00s when it went from “this will never work” to “this was clearly the solution all along” almost overnight. And the rest, as they say, is history. 🙂

But the more I thought about this, the more I realized that — as often happens in these kinds of heated debates — the truth lies somewhere in the middle. Having used both several frameworks, and several web components, and having authored both web components (most of them experimental and unreleased) and even one framework over the course of my PhD, both sides do have some valid points.

Frankly, if framework authors were sold the idea that web components would be a compile target for their frameworks, and then got today’s WC APIs, I understand their frustration. Worse yet, if every time they tried to explain that this sucks as a compile target they were told “no you just don’t get it”, heck I’d feel gaslit too! Web Components are still far from being a good compile target for all framework components, but that is not a prerequisite for them being useful. They simply solve different problems.

Let me explain.

Venn diagram illustrating general vs project-specific components

Not all component use cases are the same.

I think the crux of this debate is that the community has mixed two very different categories of use cases, largely because frameworks do not differentiate between them; “component” has become the hammer with which to hammer every nail. Conceptually, there are two core categories of components:

  1. Generalizable elements that extend what HTML can do and can be used in the same way as native HTML elements across a wide range of diverse projects. Things like tabs, rating widgets, comboboxes, dialogs, menus, charts, etc. Another way to think about these is “if resources were infinite, elements that would make sense as native HTML elements”.
  2. Reactive templating: UI modules that have project-specific purposes and are not required to make sense in a different project. For example, a font foundry may have a component to demo a font family with child components to demo a font family style, but the uses of such components outside the very niche font foundry use case are very limited.

Of course, it’s a spectrum; few things in life fit neatly in completely distinct categories. For example an <html-demo> component may be somewhat niche, but would be useful across any site that wants to demo HTML snippets (e.g. a web components library, a documentation site around web technologies, a book teaching how to implement UI patterns, etc.). But the fact that it’s a spectrum does not mean the distinction does not exist.

WCs primarily benefit the use case of generalizable elements that extend HTML, and are still painful to use for reactive templating. Fundamentally, it’s about the ratio of potential consumers to authors.

The huge benefit of Web Components is interoperability: you write it once, it works forever, and you can use it with any framework (or none at all). It makes no sense to fragment efforts to reimplement e.g. tabs or a rating widget separately for each framework-specific silo, it is simply duplicated busywork.

As a personal anecdote, a few weeks ago I found this amazing JSON viewer component, but I couldn’t use it because I don’t use React (I prefer Vue and Svelte). To this day, I have not found anything comparable for Vue, Svelte, or vanilla JS. This kind of fragmentation is sadly an everyday occurrence for most devs.

But when it comes to project-specific components, the importance of interop decreases: you typically pick a framework and stick to it across your entire project. Reusing project-specific components across different projects is not a major need, so the value proposition of interop is smaller.

Additionally, the ergonomics of consuming vs authoring web components are vastly different. Consuming WCs is already pretty smooth, and the APIs are largely there to demystify most of the magic of built-in elements and expose it to web components (with a few small gaps being actively plugged as we speak). However, authoring web components is a whole different story. Especially without a library like Lit, authoring WCs is still painful, tedious, and riddled with footguns. For generalizable elements, this is an acceptable tradeoff, as their potential consumers are a much larger group than their authors. As an extreme example of this, nobody complains about the ergonomics of implementing native elements in browsers using C++ or Rust. But when using components as a templating mechanism, authoring ergonomics are crucial, since the overlap between consumers and authors is nearly 100%.

This was the motivation behind this Twitter poll I posted a while back. I asked if people mostly consumed web components, used WCs that others have made, or both. Note that many people who use WCs are not aware of it, so the motivation was not to gauge adoption, but to see if the community has caught on to this distinction between use cases. The fact that > 80% of people who knowingly use web components are also web components authors is indicative of the problem. WCs are meant to empower folks to do more, not to be consumed by expert web developers who can also write them. Until this number becomes a lot smaller, Web Components will not have reached their full potential. This was one of the reasons I joined the Web Awesome project; I think that is the right direction for WCs: encapsulating complexity into beautiful, generalizable, customizable elements that give people superpowers by extending what HTML can do: they can be used by developers to author gorgeous UIs, designers to do more without having to learn JS, or even hobbyists that struggle with both (since HTML is the most approachable web platform language).

So IMO making it about frameworks vs web components is a false dichotomy. Frameworks already use native HTML elements in their components. Web components extend what native elements can do, and thus make crafting project-specific components easier across all frameworks (as well as no frameworks). I wonder if this narrative could resonate across both sides and reconcile them. Basically “yes, we may still need frameworks for nontrivial apps, but web components make their job easier” rather than pitting them against each other in a pointless comparison where everyone loses.

We will certainly eventually get to the point where web components are more ergonomic to author, but we first need to get the low-level foundations right. At this point the focus is still on making things possible rather than making them easy. The last remaining pieces of the puzzle are things like Reference Target for cross-root ARIA or ElementInternals.type to allow custom elements to become popover targets or submit buttons, both of which saw a lot of progress at W3C TPAC last week.

After that, perhaps eventually web components will even become viable for reactive templating use cases; things like the open-stylable shadow roots proposal, declarative elements, or DOM Parts are some early beginnings in that direction, and declarative shadow DOM paved the way for SSR (among other things). Then, and only then, they may make sense as a compile target for frameworks. However, that is quite far off. And even if we get there, frameworks would still be useful for complex use cases, as they do a lot more than let you use and define components. Components are not even the best reuse mechanism for every project-specific use case — e.g. for list rendering, components are overkill compared to something like v-for. And by then frameworks will be doing even more. It is by definition that frameworks are always a step ahead of the web platform, not a failing of the web platform. As Cory said, “Frameworks are a testbed for new ideas that may or may not work out.”.

The bottom line is, web components reduce the number of use cases where we need to reach for a framework, but complex large applications will likely still benefit from one. So how about we conclude that frameworks are useful, web components are also useful, stop fighting and go make awesome sh!t using whatever tools we find most productive?

Thanks to Michael Warren, Nolan Lawson, Cory LaViska, Steve Orvell, and others for their feedback on earlier drafts of this post.

Making the Web more Awesome — for everyone

Folks, I have some exciting news to share. 🤩

Today I start a new chapter in my career. After a decade at MIT, teaching and doing research at the intersection of usability and programming language design, I wrapped up my PhD two weeks ago (yes, I’m a Dr now! And damn right I will — once it actually sinks in) and today I start my new role as Product Lead at Font Awesome.

Font Awesome + Lea Verou

I will be evaluating user needs and improving product design and usability across all company products, with an emphasis on Web Awesome, the product we are launching early next year to revolutionize how Web UIs are built by using web components and CSS in ways you’ve never seen before. Beyond improving the products themselves (all of which include extensive free & open source versions), part of my role will utilize my web standards experience to collect web platform pain points from across the company and translating them to new and existing web standards proposals.

Yes, I know, it’s a match made in heaven. 😍 There is even a small chance I may have been the first to create an icon font for use in a web UI via @font-face, which would make it even more wonderfully poetic that I’m joining the company that has become synonymous with icon fonts on the Web.

However, it was not my MIT PhD that led me to this role, but an email from Dave Gandy (creator & CEO of Font Awesome) about Color.js, that turned into hours of chats, and eventually a job offer for a role I could not refuse, one that was literally molded around my skills and interests.

The role is not the only reason I’m excited to join Font Awesome, though. The company itself is a breath of fresh air: open source friendly (as Dave says, “literally the only reason we have Pro versions is that we need to sustain this somehow” 😅), already profitable (= no scrambling to meet VC demands by cramming AI features nobody wants into our products), fully remote, huge emphasis on work-life balance, and an interview process that did not feel like an interview — or even a process. In fact, they did not even want to look at my resume (despite my efforts 🤣). It is telling that in their 10 years of existence, not a single person has left the company, and they have never had to let anyone go. Moreover, it bridges the best of both worlds: despite having existed for a decade, branching out to new products[1] and markets gives it a startup-like energy and excitement.

I had been extremely selective in the job opportunities I pursued, so it took a while to find the perfect role. Having ADHD (diagnosed only last year — I want to write a blog post about that too at some point), I knew it was crucial to find a job I could be passionate about: ADHD folks are unstoppable machines in jobs they love (I have literally built my career by directing my hyperfocus to things that are actually productive), but struggle way more than neurotypicals in jobs they hate. It took a while, but when I started talking with Dave, I knew Font Awesome was it.

I’m still reeling from the mad rush of spending the past couple of months averaging 100-hour weeks to wrap up my PhD before starting, but I couldn’t be more excited about this new chapter.

I’m hoping to write a series of blog posts in the coming weeks about about my journey to this point. Things like:

  • How I decided that academia was not for me — but persisted to the finish line anyway because I’m stubborn AF 😅
  • How I realized that product work is my real calling, not software engineering per se (as much as I love both)
  • How I used web technologies instead of LaTeX to write my PhD thesis (and print it to PDF for submission), with 11ty plus several open source plugins, many of which I wrote, an ecosystem I hope to one day free more people from the tyranny of LaTeX (which was amazing in the 70s, but its ergonomics are now showing their age).

But for now, I just wanted to share the news, and go off to make the web more awesome — for everyone. 🚀


  1. My excitement grew even stronger when a week before my start date, I learned that 11ty (and its creator, Zach Leatherman) had also joined Font Awesome — I think at this point every tool I use regularly is officially Awesome 😅. Yes, this site is built on 11ty as well. And even my PhD thesis! ↩︎

Forget “show, don’t tell”. Engage, don’t show!

A few days ago, I gave a very well received talk about API design at dotJS titled “API Design is UI Design” [1]. One of the points I made was that good UIs (and thus, good APIs) have a smooth UI complexity to Use case complexity curve. This means that incremental user effort results in incremental value; at no point going just a little bit further requires a disproportionately big chunk of upfront work [2].

Observing my daughter’s second ever piano lesson today made me realize how this principle extends to education and most other kinds of knowledge transfer (writing, presentations, etc.). Her (generally wonderful) teacher spent 40 minutes teaching her notation, longer and shorter notes, practicing drawing clefs, etc. Despite his playful demeanor and her general interest in the subject, she was clearly distracted by the end of it.

It’s easy to dismiss this as a 5 year old’s short attention span, but I could tell what was going on: she did not understand why these were useful, nor how they connect to her end goal, which is to play music. To her, notation was just an assortment of arbitrary symbols and lines, some of which she got to draw. Note lengths were just isolated sounds with no connection to actual music. Once I connected note lengths to songs she has sung with me and suggested they try something more hands on, her focus returned instantly.

I mentioned to her teacher that kids that age struggle to learn theory for that long without practicing it. He agreed, and said that many kids are motivated to get through the theory because they’ve heard their teacher play nice music and want to get there too. The thing is… sure, that’s motivating. But as far as motivations go, it’s pretty weak.

Humans are animals, and animals don’t play the long game, or they would die. We are programmed to optimize for quick, easy dopamine hits. The farther into the future the reward, the more discipline it takes to stay motivated and put effort towards it. This applies to all humans, but even more to kids and ADHD folks [3]. That’s why it’s so hard for teenagers to study so they can improve their career opportunities and why you struggle to eat well and exercise so you can be healthy and fit.

So how does this apply to knowledge transfer? It highlights how essential it is for students to a) understand why what they are learning is useful and b) put it in practice ASAP. You can’t retain information that is not connected to an obvious purpose [4] — your brain will treat it as noise and discard it.

The thing is, the more expert you are on a topic, the harder these are to do when conveying knowledge to others. I get it. I’ve done it too. First, the purpose of concepts feels obvious to you, so it’s easy to forget to articulate it. You overestimate the student’s interest in the minutiae of your field of expertise. Worse yet, so many concepts feel essential that you are convinced nothing is possible without learning them (or even if it is, it’s just not The Right Way™). Looking back on some of my earlier CSS lectures, I’ve definitely been guilty of this.

As educators, it’s very tempting to say “they can’t possibly practice before understanding X, Y, Z, they must learn it properly”. Except …they won’t. At best they will skim over it until it’s time to practice, which is when the actual learning happens. At worst, they will give up. You will get much better retention if you frequently get them to see the value of their incremental imperfect knowledge than by expecting a big upfront attention investment before they can reap the rewards.

There is another reason to avoid long chunks of upfront theory: humans are goal oriented. When we have a goal, we are far more motivated to absorb information that helps us towards that goal. The value of the new information is clear, we are practicing it immediately, and it is already connected to other things we know.

This means that explaining things in context as they become relevant is infinitely better for retention and comprehension than explaining them upfront. When knowledge is a solution to a problem the student is already facing, its purpose is clear, and it has already been filtered by relevance. Furthermore, learning it provides immediate value and instant gratification: it explains what they are experiencing or helps them achieve an immediate goal.

Even if you don’t teach, this still applies to you. I would go as far as to say it applies to every kind of knowledge transfer: teaching, writing documentation, giving talks, even just explaining a tricky concept to your colleague over lunch break. Literally any activity that involves interfacing with other humans benefits from empathy and understanding of human nature and its limitations.

To sum up:

  1. Always explain why something is useful. Yes, even when it’s obvious to you.
  2. Minimize the amount of knowledge you convey before the next opportunity to practice it. For non-interactive forms of knowledge transfer (e.g. a book), this may mean showing an example, whereas for interactive ones it could mean giving the student a small exercise or task. Even in non-interactive forms, you can ask questions — the receiver will still pause and think what they would answer even if you are not there to hear it.
  3. Prefer explaining in context rather than explaining upfront.

“Show, don’t tell”? Nah. More like “Engage, don’t show”.

(In the interest of time, I’m posting this without citations to avoid going down the rabbit hole of trying to find the best source for each claim, especially since I believe they’re pretty uncontroversial in the psychology / cognitive science literature. That said, I’d love to add references if you have good ones!)


  1. The video is now available on YouTube: API Design is UI Design ↩︎

  2. When it does, this is called a usability cliff. ↩︎

  3. I often say that optimizing UX for people with ADHD actually creates delightful experiences even for those with neurotypical attention spans. Just because you could focus your attention on something you don’t find interesting doesn’t mean you enjoy it. Yet another case of accessibility helping everyone! ↩︎

  4. I mean, you can memorize anything if you try hard enough, but by optimizing teaching we can keep rote memorization down to the bare minimum. ↩︎