Why Standards Compliance is a Tricky Notion
I just published a book about metadata, called Metadata Basics for Web Content. The book refers to many standards, and provides samples of code illustrating metadata (or structured data, if you prefer) using these standards. To locate good code examples, I relied on international organizations such as the W3C, industry working groups such as schema.org, and prominent companies such as Google.
All these sources are important ones for publishers to consult. But if you pay very close attention, you may notice that the various sources aren’t always completely aligned with one another. This is a bit disconcerting. Publishers, after all, are expected to comply with standards. Various standards reference and build on each other. But certain details are different as you move between different actors in the standards arena. How can that be, that standards aren’t completely aligned? To answer that question one must consider the governance, mission, and adoption goals of various parties involved with standards.
Publishers should recognize that no one party is in charge of metadata standards. Many parties are involved. Decisions and practices evolve organically through a combination of planning and adaptation. Different parties offer different choices.
The W3C is the largest standards body addressing web content. It has a fairly open structure. If there is sufficient interest in a topic, where enough people volunteer to work on standards issue, then a group can be started, which can begin a process of drafting notes, recommendations, and eventually standards. The W3C doesn’t always initiate standards. Sometimes they embrace standards that have been developed by other groups. And sometimes the W3C has different groups addressing broadly similar issues, but in different ways. While W3C recommendations and standards carry tremendous weight, they do not always represent a single consensus about priorities. Generally, they skew toward accommodating a diverse range of needs, rather than enforcing a narrow set of practices. As a nonprofit body, the W3C isn’t marketing anything, or promoting adoption of one standard over another.
Many industry groups develop standards as well. An important one in the area of web content metadata is called schema.org. This group started out as a partnership between search engine companies, namely Google, Bing, Yahoo and Yandex. These companies developed a core set of standards for describing common web content with metadata. Now that the core standard has been developed, schema.org has subsequently transformed to become a W3C community group. Google remains the single most important driver of schema.org’s development. But as a community, the standard has accepted contributions from many parties, and the scope of the standard is expanding.
In addition to international bodies and industry groups, certain companies, on account of their size and influence, influence standards practices through the implementation choices they make. They may set trends of what are deemed “best practices” or they may recommend to others how to do things. Google again is a leading example of a single firm having a big influence on standards. As a private company, it recommends guidelines to its customers, the publishers who want their content to display in Google’s search results. These guidelines seem like standards, though they are specific to one company.
Let’s consider how different levels of standards interact with each other.
Metadata needs to be encoded using a syntax. One widely used syntax is called RDFa, which is a W3C standard.
Metadata also needs schema to indicate entities and properties within the content. Schema.org metadata can be encoded using RDFa syntax. So we have one standard relying on another. But schema.org only uses part of the RDFa specification. There are some features in RDFa that aren’t needed when implementing schema.org. Other metadata schemas also use the RDFa syntax, and some of these take advantage of the additional features. The group designing schema.org decided to pare down what was needed to implement schema.org in RDFa. They chose to keep things as simple as they could to help promote adoption of their schema.
As mentioned earlier, Google is a key player as both a developer of schema.org, and as a consumer of schema.org metadata. Google evangelizes the use of schema.org metadata, and they offer guidelines and tools to help webmasters learn what they need to do. Publishers often take this advice as gospel. They presume they need to comply with Google’s standards, at least as they understand them. What they may not realize is that Google’s tools and guidelines are often advice rather than rigid rules. When developing its advice and tools, Google has chosen to focus on high priority content that many organizations produce, and provide guidelines to help webmasters ensure that they don’t make mistakes when creating metadata for such content. Google’s guidelines only cover a subset of the range of content addressed by schema.org. In effect, Google has chosen to simplify schema.org further to encourage wider adoption of it.
Google’s guidelines provide assurance that if complied with, the metadata will work with Google. However, it does not follow that if the publisher deviates from Google’s guidelines that their metadata is wrong. Many publishers use Google’s structured data testing tool (SDTT) to validate their metadata. It’s a useful tool, but it validates only some dimensions of schema.org metadata, not all dimensions.