Using Web Analytics for Modeling Application Usage in Performance Tests

In my last article, I wrote about the paradigm shift in web application architecture and why performance testers have to re-think their strategy around testing Rich Internet Applications (RIA) for performance. Web application development processes and user expectations continue to grow by leaps and bounds. Sadly, the techniques and approaches employed to test those applications have not kept up with the same growth rate. But the good news is that newer tools are coming up and methodologies are being defined to close in on that gap. Hence, it is essential that performance testers make use of them at every phase of the performance testing lifecycle.

Early on in the performance testing lifecycle, testers gather requirements and collecting application usage statistics is typically one of the primary tasks. In this article, I will explain how “Web Analytics tools” can be a great source of information to gather historical data about the application usage and user behavior.

Traditional Web Server Log Approach


Traditionally, performance testers have been relying on the web server log files to collect historical application usage data. A web server log was and still is a great source of information. They contain enormous data on web usage activity and server errors. Downloading log files from the web server and running report generation tools will help testers get meaningful info out of them. However, web server logs have their limitations. For example,

  • Usage data contained in the web server logs do not include most “page re-visits” due to browser caching. For e.g. If a user re-visits a page, no request is received by the web server as the page is retrieved from browser cache.
  • While data contained in the Web server logs can provide insights into system behavior, it does not help much in understanding “user/human behavior”.
  • Web server logs do not provide user’s geographical info, the browser they used and the device/platform they accessed the application from. All of which are vital metrics to understand user behavior on the application.


While Web Server log files are still a great way to measure user statistics, new ways to measure web traffic have propped up that provide information from a user-perspective rather than a system perspective. A large number of organizations are implementing what is called “Web Analytics Tools” as part of their Web application infrastructure. For example: Industry reports suggest that Google analytics, a leading Web Analytics tool provider is used on 57% of the top 10,000 websites.

Web Analytics and insight into User Behavior

“Website visitors are now people, not clicks”, says a leading Web Analytics provider. In short, that is what Web Analytics tools bring to the table. They track and analyze what real users do when they are on a web application. These tools keep track of the pages that users landed on, where they navigate within the site and where they exit from. Web Analytics tools use a technique called “page tagging” that uses a combination of JavaScript from within the browser and cookies to capture these metrics. As the metrics are captured from real users’ browser, they are more accurate and insightful. Web Analytics tools are primarily used by companies for search engine optimization (SEO) and measuring advertising/marketing initiatives. However, these tools can also provide valuable metrics for performance testers in designing load test scenarios that realistically mimic user behavior.

web-analytics-dashboard
Fig. A snapshot from a typical Web Analytics dashboard

Realistic Estimate of User Idle Time


Anyone who has been doing performance testing for a while knows that “user idle time” is an important metric to factor into a realistic load test scenario. It determines the speed at which a Web server is being hit with load. But, it is often represented inaccurately in a test. Some use the user idle time that is being recorded, some use the value provided by business owners, and there are others who just make guesstimates. Web analytics tools provide two key metrics, known as “Average Time Spent on the Site” and “Average Time Spent on the Page”. These metrics can be useful for performance testers in accurately determining the “user idle time” between business transactions and eliminate guesswork out of the equation.

Realistic Emulation of Browser Caching Behavior

Industry reports suggest that ‘Cached pages’ can account for up to one-third of all page views. Due to its obvious performance benefits, browser caching mechanism is being extensively used by application developers. Web server logs do not (and will not be able to) capture user activity metrics for cached pages as no request is made to the Web server. On the contrary, Web Analytics tools track visits to cached pages (as they track usage from users’ browser) and thus provide a more accurate picture of “browser caching” on the web application. Performance testers can use this information to determine what percentage of total application usage is being cached and emulate this browser behavior in the load test scenario.

Emulation of User Behavior on 3rd party Web elements like AJAX & Flash

How many of you are involved in load testing 3rd party web components like Flash, Silverlight & AJAX and feel that you don’t have any historical usage data to work with? You’re not alone and there is a reason for that. Web server logs are not really great at tracking usage data for these 3rd party web elements. However, Web Analytics tools do a great job of tracking and reporting user activity on Flash-driven elements, embedded AJAX page elements, page gadgets, and file downloads and so on. So the next time you are involved in testing one of these 3rd party elements, you have a savior.

WAN Emulation by factoring in User Geo-Location


geo-location-web-analyticsLately, product and business owners of Rich Internet Applications are requesting performance testers to identify geographical locations where page load times are higher than a specific threshold. As you know already, user traffic for web applications can virtually come from anywhere in the globe. In order to build a load test scenario that satisfies this objective, it is imperative for performance testers to gather historical usage data that provides a breakdown of customers by geographical region they are accessing the application from. Web analytics tools help us to pinpoint the geographical location of users using a technique called Geo-Location. This is done by way of tracking the IP address of users through cookies to determine where they are located. Performance testers can make use of this valuable data in conjunction with WAN Emulation (or) Cloud-based tools to design a load test scenario where user load is generated from across those geographic locations. By doing that, performance testers can factor in the network conditions from those geographical regions and emulate them accordingly.

Conclusion


In short, Web Analytics tools do not replace Web server logs as a source for usage activity measurement and reporting. They rather complement each other. Web server logs along with key “user behavior” metrics from Web Analytics tools can provide a complete picture of usage activity for your web application under test. It is time that performance testers make use of this valuable tool in modeling application usage for their load tests.

A Paradigm Shift in Web Application Architecture and Why Performance Testers Should Care


Modern browsers are turning into miniature operating systems. They can multi-task browsing processes, allocate and manage memory, collect garbage and much more. They are capable of running complex web applications on their own with minimal server interaction. There is now a paradigm shift in the web application architecture as a majority of application processing is shifting from the server to the web browser. The web browser, once called a “Thin” client has become a big fat cat lately.

The Browser Wars

Meanwhile, leading browser makers are fiercely competing against each for dominance in the web browser market share. This so-called “browser war” has set off major developments in the capabilities of popular browsers like Internet Explorer, Firefox and Chrome as more and more features are built into them. Browsers are now capable of processing data from multiple sources like Content Delivery Networks (CDNs), Ad Networks and Analytics providers and present them to the user. Browser makers are also scrambling hard to bundle as many new features and enhancements as possible to their browsers to stay ahead in the race. Mozilla, for example, recently announced a new rapid release schedule in order to bring faster updates to their Firefox browser. Google has been doing this to its Chrome browser for a while now. However, as browser capabilities are improved, it has also introduced additional complexity to web application architecture.

Browser Wars

Mobile Computing and the Rapid Adoption of Newer Web Standards

On the other hand, W3C, the organization that sets Internet standards has also realized the need for newer web standards in this era of mobile computing. HTML, the core technology for structuring and presenting content for the web is undergoing a major upgrade as part of W3C’s new HTML5 specification. Among other things, the new HTML5 standards will make it possible for users to view multimedia and graphical content on the web without having to install proprietary plug-ins and APIs. Related standards like CSS (that defines the layout) and DOM (that defines interaction with data objects) are also getting an overhaul. Technologies like CSS3 and XMLHttpRequest (XHR) are gaining wide adoption and popularity. These newer web standards have put the onus on the web developers and front-end engineers to build interactive web applications that are fast, highly responsive and that behave like a traditional software application.

Web 2.0 and the Semantic Web

At a time when information technology professionals and end-users are getting acquainted with the web 2.0 technologies, work is already underway in defining specifications for web 3.0, what is popularly referred to as the “Semantic Web”. One can only expect rapid developments in this trend in the days ahead and this new web application architecture is here to stay. It is also becoming more and more important that modern web applications are developed to make best use of these technologies.

The Arrival of Mobile and Cloud computing

A recent study by Nielsen research found that 4 in 10 U.S. phones are now smartphones. Morgan Stanley also predicted that the number of mobile Internet users will exceed desktop Internet users by 2015. These studies and predictions aside, there is no doubt that smartphones and tablet PCs have revolutionized the way users browse the web. That being said, speed and performance have become key considerations in mobile web application development as bandwidth has now become a limited and valuable resource.

Rapid adoption of cloud computing technologies like software as a service (SaaS) and platform as a service (PaaS) is in a way forcing this shift in web application architecture as cloud computing providers deliver applications through the Internet.

Why Speed and Performance is Key in this new landscape

One of the key driving factors behind this shift in web application architecture is that end users have become more intolerant to slow page response times and are demanding a highly responsive and enhanced user experience. A recent study finds that an average user will not wait for more than 2 seconds for a web page to load. At the same time, the average user has also become more sophisticated and is demanding a web experience that is interactive, rich in multimedia and easy to use. As a result, speed and performance have taken center stage and have become primary considerations in web application development.

Clock

Performance Bottleneck outside the Firewall

As I said earlier, the modern web application architecture has introduced several new components that play a vital role in determining the speed and performance of the web application. These new components are double-edged swords. While they provide huge business benefits, these components can also turn into a potential performance bottlenecks impacting page load times and user experience. Typical performance problems involving some of these components that are outside the application firewall include:

  • Issues related to DNS caching, lookup and routing.
  • 3rd party widgets, plug-ins and buttons on the web page could be bottlenecks.
  • Problems due to inefficient deployment of content delivery network (CDN) solutions, which would mean that the CDN-based performance acceleration is not effective.
  • Availability issues arising due to mobile network bandwidth limitations.
  • Performance issues as a result of larger file sizes and inefficient loading of JavaScript & CSS objects and modules.
  • Web app slowdowns due to Ad networks and analytics tools.

And so on…

And now to the last part… Why Performance Testers should care?

For so long, we performance testers have been focusing our testing efforts mainly (or in many cases, ONLY) on server performance. Rightly so, since a bulk of the application processing was happening at the App and the database tiers. However, the paradigm shift in web application architecture has turned web pages into rich internet applications. This shift not only affects web architects and designers, it also changes the way software testers look for defects in modern web applications. Performance testers, in particular will have to re-think their strategy around testing web applications for performance.

The traditional focus on just server performance and optimization is not adequate anymore. It is like looking at only one side of a coin and as you know, every coin has two sides. A new field within performance engineering called web performance and optimization is emerging and it is high time performance testers take a note of it


Hi, my name is Suraj and I'm a Software performance & automation engineer based in Chicago. I speak and write on topics and trends in the test automation & performance engineering space.