OPERATING BLIND - THE REALITY OF DIGITAL OPERATIONS IN 2019

 

2019 looks set to be another year where websites and applications continue to rely on 3rd-party services to create rich, personalized experiences for their visitors. There are many great benefits to integrating 3rd-party code into your application - from increased agility and domain expertise to adding functionality that gives you the ability to keep pace with giants like Amazon and Netflix. The many reasons for using these services have been established for some time, however the challenge of operating these complex integrated applications is just now starting to crystallize.

 

Part of the challenge is a lack of clear ownership or responsibility for this 3rd-party code, part is subversion of normal code review and approval processes, and part is due to a lack of tools and technologies that provide complete visibility into application behavior. This is a real challenge for digital operations teams, when close to 40% of all the code a visitor interacts with can come from 3rd-party services!

 

As teams outside of the normal approval, testing, and operating process add 3rd-parties, whether they are marketing technologies, new product functionality, or business and financial planning tools, the operation of each new service is rarely factored into the decision to add the technologies in the first place. The obvious (but wrong) reaction to this issue would be to remove most 3rd-party code from your web application. But it is simply impossible to operate a robust modern website without analytics, advertising, personalization, payment, and other critical capabilities, capabilities that would not make sense to build in-house. As a result, these tools and technologies are here to stay.

 

When we say digital operations teams are ‘operating blind’, we mean that these teams do not have the necessary visibility into or control of 3rd-party services needed to effectively maintain smooth operations for all visitors, on every device and browser. Since almost half of the code that now makes up your application originates outside of your servers and data centers, there is often little that can be done to identify problems, let alone fix them, when things start to misbehave. Most organizations have robust monitoring, alerting, and incident response tools, training, and teams responsible for the code and infrastructure they build. If something goes wrong with a 1st-party resource, database or application, there are likely 2, 3, or more people being alerted within minutes, and a whole team available to write, test and implement a solution. This robust level of support is most likely not in place (or certainly not as accessible) for 3rd-party services added to your website.

 

For example, let’s say that your retargeting vendor's JavaScript collides with your website’s in-house JavaScript and breaks the “add to cart” functionality of your product pages. How would you find out about this? JavaScript error monitoring? A decline in “Add to cart” events tracked in your marketing analytics tool?  Declining revenue on your bookings platform? None of these methods are likely to correctly attribute the problem to the retargeting vendor. But, let’s say you actually are able to pin down what is happening: how would you then act to stop this from happening? How can you turn off this service? What if the problem only occurs on specific devices or browsers?

 

These types of 3rd-party service incidents are almost impossible to anticipate ahead of time but are becoming more and more common as your code shares space with dozens and dozens of 3rd-parties. 3rd-party services can be updated weekly, daily or even continuously, so there is no way to know if the version you tested when you originally chose that vendor is still compatible with your website. And as organizations add more and more 3rd-parties, and especially as the teams making buying decisions are not actually the ones operating the service or managing the impact to the overall website experience, digital operations teams are going to get stuck fixing new and complex problems.

 

According to a recent post by Techrepublic, one of the 10 largest problems Operations teams will face in 2019 will be this management of cloud services:

“For instance, IT is increasingly taking on the role of supporting cloud services in terms of aggregation, customization, integration and governance.” “Rather than focusing solely on engineering and operations, [Operations] must develop the capabilities needed to broker services; these will require different roles to the I&O of old.

 

So, what can operations teams do in 2019 so they’re not operating blind?

 

First, put in place solutions that provide visibility into how their applications actually come together for real visitors, in the browser, at runtime. This must go beyond marketing analytics and RUM tools, as to be effective the emphasis must be on how the various 3rd-parties integrated into the application are working. Statistics like load time, average load time over seven days, size (kb), and JS errors should be captured and reported on so teams get a complete picture.

 

Second, teams should be given control of these 3rd-party services so that when issues do pop up, they can be remedied quickly. Most digital operations teams lack any sort of ‘circuit breaker’ for when a service causes an issue for a visitor - whether this is broken functionality, increased loading times, or buggy behavior. Operations team should be able to effectively isolate problems through the visibility gained by adding a robust monitoring solution and then act surgically to fix issues. Simply pulling the plug on an entire service because it is causing an issue just for Firefox users on Macs is not the correct solution.

 

Third, and possibly most important, operations teams should be deputized to have the necessary conversations with marketing, product, and engineering about the impact of 3rd-party services on the performance, reliability, and security of their applications. It is not the case that marketing or product teams want to make websites slower or less reliable, they just lack visibility into the impact of these tools and services - just like the operations team. Once operations teams can gain visibility and then put up some guard rails, they can then initiate a conversation about what services are necessary, what pages they should load on, what the performance budget for each service should be, and so on.

 

If 2018 is any sort of indicator for how 2019 will go, 3rd-party services are sure to cause headaches for all teams, not just operations. Having the right 3rd-party analytics and control solution in place will prevent hese headaches from cropping up in the first place.