Submitted by Damien on
Tags:
TL&DR: The new Metatag 2 releases break compatibility with Metatag 1; Metatag 1 support will continue through 2023 but please update.
The Metatag module suite for Drupal has come a long way from its humble origins in 2010 as a rewrite of the Nodewords module, the main module that people used to add meta tags to their Drupal 5 and 6 sites. Dave Reid built a very flexible architecture to work with Drupal 7 in 2010 and got it to an alpha state before life and other priorities forced him to step back from active work. I joined as comaintainer in 2012 and expanded the system to cover multiple languages and content revisions, finally hitting a stable release in September 2014. When Drupal 8 came on the scene in November 2015 a version of Metatag was built for D8 that reworked the D7 CTools-driven APIs using the new PSR-4 plugins system of Drupal core’s new OOP-focused architecture, while still following the same overall philosophies.
Jump forward seven and a half years since Metatag 8.x-1.0 was released and not much has changed - both the D7 and (now) D9/10 versions of Metatag work much the same as they did upon their initial releases. Most of the changes over the years were to add more functionality - more meta tags supported, more add-ons for integrating with other modules, and more support from the community. Meanwhile the world has evolved - software development concepts have improved, Drupal entity (node, etc) forms have gotten more complex, and site owners need to be able to integrate their Drupal data with other systems; maybe it’s time to rethink a few pieces of the Metatag architecture?
Improved handling of multi-value meta tags
An ongoing problem for Metatag over the years has been the need to handle conflicting requirements.
A recurring example of this is where an image meta tag allows multiple values, e.g. og:image. In this scenario the module tries to automatically split the single value into multiple tags using the comma as the separator, so a value of “image1.png, image2.png” would be turned into two separate og:image tags, one for image1.png and another from image2.png. This is all fine with simple filenames, but a filename “apples,bananas-and-pears.png” would be split into two pieces for separate tags - one for “apples” and one for “bananas-and-pears.png”. Additionally, the logic for automatically extracting the URL from image HTML tags gets confused if there’s a comma in the image’s ALT attribute, potentially leading to broken output.
When dealing with more complicated data structures that might have lots of commas, especially with what is possible with the Schema Metatag ecosystem, this problem gets substantially worse.
Thanks to leadership by long time contributor Karen Stevenson and a number of other contributors, it is now possible to customize the string used to identify multi-value meta tags. While existing sites won’t see any changes automatically, for backwards-compatibility reasons the default is to leave everything as-is with the humble comma, there’s now a setting that controls the separator used to split up multiple meta tags. With this functionality available, sites will no longer be left wondering why their “apples,bananas-and-pears.png” image is missing its apple.
JSON data storage
One of the core architectures that both the Drupal 7 and D8+ versions of Metatag share is that they store data in serialized arrays. This became a common practice very early on in the PHP world as it solved a complex problem - storing indeterminate data structures easily - with just two functions. As a result, use of serialized data is endemic in the Drupal ecosystem.
Simply put, a serialized variable is a variable (of any complexity) that has been turned into a string so that it can be stored in the field in the database as if it were something simple like a person’s name. Serialization in PHP performs its magic by reproducing the variable using a predetermined syntax, using identifier characters, double-quotes, colons, semicolons and numbers to build its structure.
For example, an array variable (a type of list) with an element named “description” that has the value “mango” would become the following string:
a:1:{s:11:"description";s:5:"mango";}
If that description value changes to “I really like mangoes”, the serialized format changes to the following:
a:1:{s:11:"description";s:21:"I really like mangoes";}
As can be seen, two parts of the serialized string change - the “mango” piece itself, and the number following the preceding “s” value, which indicates the string’s length. Furthermore, in both examples the “description” element name itself is a string, thus it needs an indicator of its length. Finally, a semicolon is added after the quoted string, indicating an end to this element’s definition. Meanwhile, the array itself has the “a” keyword, and its elements are wrapped with curly brackets.
If a second value is added to the serialized array, maybe the “keywords” element with the value “mango, fruit, delicious”, it would end up like the following:
a:2:{s:11:"description";s:21:"I really like mangoes";s:8:"keywords";s:23:"mango, fruit, delicious";}
In this case, the number following the “a” keyword increases from 1 to 2, indicating that the array now has two items. The array’s interior two new “s” values are added, one for the name of the new variable, another for the variable’s actual value.
Extending this concept to nested arrays, as some meta tags require, along with the hundreds of meta tags available in the Metatag ecosystem, it’s clear to see that this can be a complex data structure to work with.
Querying Metatag D7 or D9’s data to find the “description” meta tag records that contained the string “mango” would be something like this:
SELECT * FROM node__field_metatag WHERE field_metatag_value LIKE %“‘description”%mango%’
While that could be written differently using core’s database abstraction layer’s query Select system, the raw SQL shows the complexity. There is also no way of guaranteeing that the meta tag with the “mango” keyword is actually from the “description” tag, as can be seen from the example above this query would return the last result even though “mango” wasn’t actually part of the “description” meta tag.
Using the JSON data structures provided by the database would let much simpler queries be built. The example above could be rewritten as follows:
SELECT * FROM node__field_metatag WHERE field_metatag_value.description LIKE ‘%mango%’
This architecture allows far more complex queries to be written, which will greatly improve and simplify the ability to work with Metatag’s data, opening the door to far more powerful Views integration, amongst other possibilities.
In addition to the complexity of writing queries to access serialized data, it was also realized that storing serialized objects could be a security problem. While many have written about this problem, including the Open Web Application Security Project aka OWASP, and the well respected security agency NotSoSecure, this particular problem has lead to major vulnerabilities in Drupal and many other open source projects, including a 2018 vulnerability in WordPress.
The ability to natively store JSON data in the database is a requirement for Drupal 10, is recommended for Drupal 9.4+, and is possible today on Drupal 7 and 9+ using the JSON Field module. As a result of this community-wide adoption of JSON storage, Metatag 2 takes the step to convert existing data to JSON, and all data going forward will be stored accordingly. While the initial support does not yet change the underlying database structure, once Drupal fully supports JSON data types we will make that additional change.
JSON-API output
A long-running need of the Drupal community has been reliable output via core’s JSON-API for decoupled websites. While Drupal’s data architecture makes it simple to output the values from fields for a given entity, e.g. a piece of content, it’s more complicated for a system like Metatag that generates its output at runtime based upon tokens and various global settings. In addition, Metatag doesn’t require a field be added to an entity type in order for the output to be rendered, it’s automatically added via template preprocessing.
Work on adding JSON-API support began way back in 2018, but it took a good while to sort out the best approach, with a few ideas discussed. Along the way there were a huge number of contributions from load of people - feature improvements, bug fixes, testing, test writing, etc, etc. The solutions provided have actually been used in production sites for a few years, but we now have the definitive version finally available in the 2.0.0 release. A huge thank you to everyone who collaborated on this issue, without your continued efforts it would not have been possible.
The new system provides a computed field that can be added to the JSON-API output. This new data structure may be a little more complex than normal field values, but in our opinion it is more robust and open-ended, leaving more room for the data to be more useful for people building decoupled sites.
It should be noted that the final solution actually could have been finished a few years ago, except for one problem. In Metatag on Drupal 7 each meta tag is output in a defined, known order, based upon the order of the group it fits in, and then its individual order. While this logic was partially added to the D8+ version many moons ago, it was incomplete and there were several scenarios where the meta tags were loaded without adhering to the sort order. A change was added which now outputs the meta tags in a reliable order, so this won’t be a problem anymore. Once that change was committed it became a simple task to finish off the JSON-API patch and commit it.
Test system overhaul
The Metatag module includes a set of tests to make sure each and every individual meta tag works as expected - that the tag shows up in the form as expected, that it saves correctly, and then that it shows up on the page output correctly. This has been a great addition to the codebase over the years and has helped resolve problems before they were committed.
A key problem in the Drupal 8+ tag tests is they’re overly verbose. The tests currently take an individual meta tag, loads a settings form to verify that this tag exists, saves a value on the form for the tag, then loads a page to confirm the output is as expected. It then repeats this for two different types of pages (a generic stand-alone page and a node) to make sure different types of pages work properly. After that’s finished, it then repeats the whole process for every single meta tag currently available in the module, which happens to be over one hundred. The end result is that the tests on Drupal 9+ take a whopping 10 minutes to complete, even using the latest PHP 8.1 and MySQL 8.
The tags test architecture for the D8+ was mirrored off the D7 version. The Drupal 7 module did not have separate classes for each meta tag that was available, instead it just had classes for each *type* of meta tag that was available. This resulted in a far more procedural codebase than would be written today, but in the days before PSR-4 autoloading standards it was not as common in the Drupal world. When the Drupal 8 version of the module was being developed the test coverage was ported as-is to the new version rather than being rewritten, purely due to time constraints - it was easier to recreate something that already existed rather than create a new architecture.
I’ve long wanted to have more of the per-tag testing uniqueness handled on the tag class itself, making each class much more self-contained; this has been possible with the D8+ version as each meta tag has a separate class file. When I realized that a new v2 was going to be needed for Metatag due to other changes, I decided the time was right to do this work.
The new system takes a much different approach. Instead of checking each tag separately, it instead loads a form, verifies each of the currently installed meta tags is showing up properly on the form, fills in a value for each one, saves the form, and then checks the output. This makes sure that meta tags from multiple submodules don’t accidentally clash, and reduces the amount of effort needed to run the tests. Furthermore, putting the “does this work?” logic inside of the meta tag class itself allows encapsulating that logic where it should be - on the individual meta tag, not Magic Code™ in another file.
What we’ve ended up with is a massive cleanup of the tag testing architecture along with a whole bunch of change records for API changes. Going forwards it will be much easier to improve the tests further, especially when we need to refine the tests for one specific tag and not the entire platform.
The amusing part of all of the work cleaning up the test coverage is that it uncovered some bugs! For example, it was discovered that the old Google Plus “rating” tag was identical to the one shipped with the main module, so the duplicate could be removed; this scenario wasn’t tested in the old test architecture as each tag was tested individually, so it only looked for a tag named “rating”, not that there was only one tag named “rating”.
An unexpected side benefit of this change is that it now takes significantly less time to run the module’s entire test suite. Prior to the change it would take at least ten minutes to run all of the tests, and this has dropped to just over six minutes - not bad for something that is run tens or hundreds of times per month!
Upgrading to 2.0.0
As with all Drupal module updates, upgrading to 2.0.0 on a Drupal 9 or 10 site will require running the available database updates. Several are necessary to upgrade a site from 8.x-1.x to 2.0.x. Depending on the number of nodes, or terms, or users, etc, which have overridden meta tag values, this update might take a little while. Once the changes have finished it is recommended to export the site’s configuration to make sure all of those changes are retained properly.
As with all major updates, please make sure to have a full backup of the site before running the database updates. While we did add test coverage for the update scripts to make sure they worked correctly, there might be scenarios that were missed. Should any problems occur while updating, please open a bug report in the Metatag issue queue and describe the scenario, along with any error messages that might have been shown.
Incidentally, two people did report problems updating due to some of the update scripts. The problem turned out to be faulty field configurations, which was relatively easy to fix once the cause was identified. The update scripts were adjusted to recognize this type of problem, and provide a link to the issue if it runs into that error, which has documentation on how to find the faulty field and then remove it.
What about version 1?
Now that the version 2 releases are out they are going to become the recommended releases for all new installs, while the older releases drop in priority. As a result, anyone downloading Metatag for the first time will get version 2, while existing sites will be able to upgrade in their own time.
All good things must come to an end; Metatag 7.x-1.x and 8.x-1.x will no longer be supported as of January 1st, 2024.
What about Metatag for Drupal 7?
Site owners will be glad to hear that the changes in Metatag 2.0.0 are also available in Metatag for Drupal 7 in the new 7.x-2.0 release, excluding the testing architecture changes and JSON-API output. Going forward I plan on adding all new features to both version 2 releases, if possible.
The Metatag module for Drupal 7 will be supported for as long as Drupal 7 itself is supported. Based on the Drupal 7 core PSA from 2023 this means it will be supported until January 5th, 2025.
Updates to other modules
Due to the number of API changes in Metatag 2 it was necessary to update some of the other Metatag-related modules in the community.
Most notably, the Schema.org Metatag module has a companion 3.0.0 release that includes updates to support the custom separator, and supports the new test system. While the proposed splitting of this module into smaller pieces didn’t happen, we will continue to support this module in the long term.
The Metatag Import Export CSV module has a version available that is able to automatically detect which version of Metatag it is working with and will adjust accordingly. Because this module only focuses on data stored in fields and not the actual output, the amount of API changes which affected it was rather limited, so one release was able to support both Metatag v1 and v2.
The AGLS Metadata module, used by a small number of sites in Australia to support their government’s expansion of the Dublin Core standard, has a compatible release for Metatag 2. Similar updates have been made available for other Metatag submodules, but it’ll be up to the respective maintainers to commit them and make new releases as necessary.
If any other modules need help updating their integration with the v2 changes, please open a support request in the Metatag issue queue and I’ll see how I can help.
What about Schema.org Blueprints?
Shortly after DrupalCon North America 2022, Webform maintainer and vocal contributor Jacob Rockowitz announced a new module for adding Schema.org data structures to Drupal sites - Schema.org Blueprints. This provides a complete solution for building a site’s data structures using a flexible interface, and can automatically generate the JSON-LD output without needing any additional logic. Furthermore the system is far more flexible than anything Schema.org Metatag can do - it isn’t necessary to build a new submodule for each Schema.org structure that needs to be added, all data structures are all automatically available.
As the maintainer of Metatag since 2012, and comaintainer of the Schema.org Metatag, I wholeheartedly recommend people look at Schema.org Blueprints as an essential tool to help architect Drupal websites and the best solution to add JSON+LD output to their Drupal 9+ sites. While it might involve a bit of work up front, the benefits of this system far outweigh what is possible using Metatag and Schema.org Metatag. Kudos to Jacob and his comaintainers for building an amazing tool!
The future
As mentioned, I plan to support both Metatag 7.x-1.x and 8.x-1.x throughout 2023, though I ask that site maintainers try updating their sites to the new v2 releases this year.
While I will be supporting the v1 releases for security issues and bugs, new features will only be added to the v2 releases.
Metatag v2 for Drupal 9+ is going to follow semantic versioning for new releases. That means that new features will go into minor releases while bug fixes go into point releases. That means that once 2.0.0 is released the new features will go into a 2.1.x branch, leading to 2.1.0, while bug fixes go into the 2.0.x branch and lead to 2.0.1, 2.0.2, etc. This will keep the module following a similar release plan to Drupal core and other contrib modules like Webform or Backup & Migrate. Should we end up needing to break backwards compatibility with API changes we’ll then move to a 3.x.y branch, and so on. At some point Drupal core is intending to support the “main” branch concept, at which point we’ll update Metatag’s branching strategy accordingly.
The Drupal 7 ecosystem doesn’t support semantic versioning, so everything will be lumped into future 7.x-2.x releases. Should we hit a point where we need to break backwards compatibility we’ll bump to 7.x-3.x, but I don’t anticipate that happening.
Thank you for using Metatag!
As parting words, I’d like to thank some people. Firstly, thanks to Mediacurrent for sponsoring most of my work on Metatag over the years. Secondly, thanks to Dave Reid for building the initial alpha release of Metatag so many years ago and entrusted me with the module while you went on to work on the media ecosystem. Last, but not least, the hundreds of people who have suggested improvements, reviewed patches, created merge requests, reported bugs, provided documentation, and occasionally screamed “help!?!” into the ether hoping to hear a voice that could guide them on their way, without all of your efforts Metatag would not be the awesome module it is today! Thank you all!