Deep diving into website traffic analysis

When performing a web analysis, you may by analyzing the overall trends from your website. These trends include PVs and sessions from the e...

When performing a web analysis, you may by analyzing the overall trends from your website. These trends include PVs and sessions from the entire website. You have to make sure that you understand the data you are gathering when conducting web Analysis. There have been issues in the past when people outlining a city’s PVs ended up making a mistake in the number of search engine results they had received.

When performing web analysis, do not pay attention to detailed data like the referrer source or the analysis of each separate page at first. You might end up reaching an incorrect conclusion. Make sure to focus on user characteristics and inclinations. You can avoid making quick, biased judgements if you look at pure data. You also want to make sure that you understand the definitions of the number of sessions and unique users as the averages available in web analytics tools is often useless in the overall scheme of things.

What Senior Web Analytics Consultants learn about trend analysis

As an Associate Web Analytics Consultant, you focused on learning about basic trends. As a Senior Web Analytics Consultant, you will learn reference values and how they interact with trends. Reference values are metrics you need to be aware of if there is a problem with your website and can help you fix it accordingly. You also need to be able to match trends with your client’s KPI.

How Logfiles Are Structured


The basic building block of web analytics is the access log. They are automatically recorded files that show the access status to the web server, the access history of a web browser, the content (web pages and image files, CSS files, JavaScript files, HTML files, etc.) and any requests made to the web server.

When you are working with DMP and IoT, you need to be able to utilize and understand access logs right from the start. They are generally stored on a web server, but if the file becomes too large for the server, it will need to be compressed into a zip file with respect to the past log.

Aggregated Logfile Data

Before logfiles are processed by analysis tools, the stored data is referred to as raw data or raw logs. The information in these files is written in the CSS (Cascade Style Sheet) language used by programmers to set styles for webpages.

website traffic analysis
  • ”Who” => Sometimes indicated by “IP address” or “user agent”
  • ”When” => Transfer time stamp
  • ”From where” => Referrer
  • ”What page” => Requested file name
  • ”Using what browser” => User agent
  • ”Whether the page was displayed or not” => Status code
Data in the raw logs is displayed with each line holding information on a single access. Each line of text holds important data on a user and can be analyzed if you know how to read and learn from it.

Hits and the number of hits to a webpage

User agents, including web browsers and search engine spiders, request information from websites. These requests are called hits. Each time a webpage is viewed, a record of the hits i.e. logfile or web server log is automatically created and saved to the web server. Web analytics consultants use these logfiles in their Analysis.

While hits are important, they are often misinterpreted as a metric that represents a website’s success. However, the number of hits rarely shows the actual number of users accessing a site or the number of webpages viewed.

The reason for this discrepancy is found in the way a webpage is constructed. Webpages are made up of many individual files that are each requested by the webserver when a user accesses a page. Each file request increases the hit-count for a website.

For example, if a homepage is comprised of one HTML file, one JavaScript file, one layout file, one text file, five images, one Flash file, and one video file, then 11 hits will be added to the hit-count each time that page is viewed and the access will be recorded to the server as a logfile.

The number of rows of text within each logfile represent how many hits a webpage receives. The lines that show access to HTML files represent PVs.

It is also important to note that repeat users generate fewer hits as browsers often store webpage files locally i.e. caching. Once files have been cached, they will not be requested from the server when a webpage is viewed. When using access analysis tools, remember that they display file numbers, not hit numbers and that in older tools, PVs are occasionally called hit numbers. Make sure that you are analyzing the correct category of data before you begin your analysis.

What You Can See from Hits

When a huge number of hits are requested, while the bandwidth of the communication line used by the web server is low, the requested web page lags i.e. takes a long time to load. Having a webpage that lags is a major problem as visitors will exit your website if it does not work properly.

Hits were originally utilized when access analysis tools were used for server and network management. Hits (or files) may still be shown (e.g., for a rental server), though this value now has almost no influence on attracting customers to a website.

Understanding Raw Data


The access log file recorded on the web server is often saved with a “.log” file extension or as a compressed file with a “.zip” file extension. They can be opened in a text editor like Notepad just like normal text files.

Since the access log file at this stage is not yet processed by analysis tools, it is often called raw data or raw logs.

Log formats

When a web server receives a request for its contents, it records the request as a file name. There are four types of files names commonly used by servers. They include the common log format, the combined log format, the Microsoft IIS log file format and the W3C expanded log file format.

1) Common log format

Common Log Format is the log format used by the most popular web server “Apache.” It records the following data items:

A “-” is shown for an item that was not collected by the server.

Note that this log format is not used as often as other formats.

2) Combined log format

This format adds the user agent (OS and browser types) and referrer information to the common log format.

 3) Microsoft IIS log file format

This is only one of the many log file formats used by the Microsoft web server “Internet Information Server (IIS).” It records a user’s IP address, user name, date, time, service and instance names, computer names, server IP address, time taken, received number of bytes, transferred number of bytes, Windows 2000 status code, request type, and operation target all separated by commas.

192.168.114.201, -, 03/20/98, 7:50:20, W3SVC2, SALES1,
172.21.13.45, 4502, 163, 3223, 200, 0, GET, DeptLogo.gif

4) W3C extended log file format

As with the format used by Microsoft, this format also uses commas to separate collected data. Unlike with other log formats, you can select fields that you want data from. The example below shows lines written to the log file when the checkboxes for [Time], [Client IP Address], [Method], [URI Stem], [Protocol Status], and [Protocol Version] are enabled.

17:42:15.16.255.255 GET /fefault.htm 200 HTTP/1.0

Defining Metrics

Referrers

A referrer is a referencing source or a link source which shows the URL of the web page that was open in the browser immediately before the current web page. By investigating the referrer to the landing page, you can see the search website or search keywords used by the user or the external website that links to your web page.

Transfer Capacity

The data quantity (number of bytes) that the web server distributes is recorded when the status code reads 200 and is 0 bytes.

IP Address and Hostname

The IP address is an identifier given to the user when they connected to the Internet. The host name is the web address used by users when they are trying to reach a page. Rather than typing in a string of numbers they can type something like “web-mining.jp” which is more readable than an IP address. The host name is assigned by a server called the DNS (Domain Name Server).

Analysts can use the IP address and host name to find out how a user is accessing a website.

Status Code

This three-digit status code represents “the meaning of the response” returned by the web server for the request. The commonly used status codes are shown below.


User agents

A user agent is a string of characters that a communication device can transmit to a web server. The user agent contains information about the properties of the device, such as its OS, the browser it is using and the device’s hardware.

There are a wide variety of user agent formats in existence. For example, Apple uses “AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10B329 Safari/8536.25” to identify the iPhone. Microsoft uses “Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)” to identify different versions of IE on a PC. Google uses “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)” to identify the Googlebot search engine spider. Goo identifies their Ichiro 4.0 spider with the string “ichiro/4.0 (http://help.goo.ne.jp/door/crawler. html).”

Note that the user agent can be disguised by the user and is not always correct.

How to Utilize the User Agent

You can obtain the following information from the user agent.
  • Type of OS
  • Type of browser Cell Phone Carrier
This information can be useful in many ways. For example, determining the type of browsers accessing the site so that your business can optimize it accordingly.

The user agent is the information defined for the browser used by the end user. This information includes the browser type and version as well as the OS type and version.

The format depends on the browser and OS types.

Combining the Windows OS and Internet Explorer

The most important data gathered by user agents are Version Tokens and Platform Tokens. A version token shows the specific browser name, while a platform token identifies the OS.

For Firefox, Safari and Google Chrome

Firefox, Safari and Google Chrome follow the user agent format officially defined by Mozilla.

Checking the User Agent

To display the user agent of the browser used, enter the following in the browser’s address bar.

JavaScript:alert (navigator.userAgent)

Session Duration

The session duration indicates how long a user viewed the measurement target (web page or website). Page session duration and website session duration data is available.

Page session duration indicates how long a user stayed on the page.

The website session duration indicates how long a user stayed on the website. It is also called session duration per visit.

Website session duration is the sum of the session duration of pages from the start until the end of a single session. Note that you cannot acquire the page session duration for the last page the user viewed before leaving the website because the page next to it exists outside of the company’s website.

Also, keep in mind when looking at website session duration data that “the page session duration for the last page is not included.” In the following example, the website session duration is eight minutes because the page session duration for the “product page” viewed last is not included.

Sessions

Average session duration is the term used to describe the average time users spend on a website each session. In many cases, this number is divided by the sum of the duration of each session by the total number of sessions. Note that this number includes sessions that ended in immediate bounces or exits.

The total duration of all sessions (in seconds) / number of sessions

For example:

The total session duration for a website is 2000 minutes or 120,000 seconds, with 200 total sessions, so:

The average session duration is 2000/200 = 10 minutes or 600 seconds

Individual session durations are calculated differently depending on whether there are engagement hits on the last page of a session.

If there are no engagement hits on the last page, then the duration is calculated as follows:

The time of the first hit on the last page - the first hit on the first page

For example:

Page A’s first hit was at 11:20 PM

Page B’s first hit was at 11:25 PM

Page C’s first hit was at 11:30 PM

11:20 - 11:30 = A session duration of 10 minutes or 600 seconds.

If there are engagement hits on the last page, then the duration is calculated as follows:

The time of the last engagement hit on the last page - the first hit on the first page

For example:

Page A’s first hit was at 11:20 PM

Page B’s first hit was at 11:25 PM

Page C’s first hit was at 11:30 PM; last engagement hit: 11:35 PM

11:35 - 11:20 = A session duration of 15 minutes or 900 seconds.

When the session interval is shorter than 30 minutes

In the figure to the left, the user exited the website at 18:10 and visited the same website again at 18:35. In this case, the session interval of 25 minutes is also included in the website session duration and is totaled out at 35 minutes.

Some analysis tools connect two successive sessions of which the interval is shorter than 30 minutes even if the browser was closed during the interval. Other tools check the referrers and separate sessions for inflows from an external website. Check with the vendor of each analysis tool for details.

Google Analytics measures the average session duration in the same way as above, but it can also use any events that have been tracked to gather last engagement hit data if you set it to do so.

Analyzing Metrics


Metrics such as page views are the basis of web statistics analysis. They can help businesses gather data on their progress and can help them determine areas in which they must improve. Metrics gathered by analysis tools are useful tools, but you must be aware that they are occasionally incorrect representations of reality.

Successful web analytics consultants must be able to use analysis tools to take advantage of the wealth of data at their fingertips. Knowing the types of metrics available and how best to use them is an invaluable skill.

Pageviews

Calculating the Average Number of PVs Shows an Index of Interest for your Website.

In general, the more PVs you are receiving the better your website is doing.

For example, Mr. A found apparel shop “ABC” through a search engine, visited the website, but left just after viewing a single page. Can you say that Mr. A was interested in this shop? He may have found the information he needed on the first page of this website. However, a general EC website requires a user to have visited at least five pages before conversion is registered as shown below. So, satisfaction from a user who viewed just one page and left the website should be considered less important than that of a user who converted.

[EC Website Case Study]

“Product information page” => “Cart page” => “Personal information entry page” => “Confirmation page” => “Thank you page”

In addition to these pages, many visitors will choose to visit the company profile page as well. They often want to learn about the company and how reliable it is. A general rule you can follow is that the more a user wants to buy a product, the more pages they will view.

Situations exist when the average PV count is not as important as it would be usually.

Blogs in particular tend record a less than average number of PVs. This is because many blog readers just browse new articles then leave. Cases where you find sites with less than two average PVs are not rare.

Remember that, depending on the purpose of the website, your client company may reach its goal even if the average PVs are low.

Analyze PVs by Comparing Data with Other Periods

When analyzing PVs, it is difficult to find issues or points to improve just by viewing the figures. It is important to compare index data with data from other periods not limited to PVs. For example, compare the current data with the data from the previous month or the same period last year. You can visualize an increase or decrease in values accurately by comparing data from different periods. When you find a rapid change in PVs try to figure out the cause.

Consider Changes in PVs per Referrer

If an apparel company spends $100,000 in advertising expenses in November and $50,000 in December, PVs might decrease in December in proportion to the decrease in advertising. PVs can also decrease due to a drop in your search engine ranking. If a link to the company page is posted to a popular blog, PVs may increase.

It is important to confirm PVs per referrer, per keyword and per search engine in your analysis. Compare the values with other periods as mentioned above to analyze the cause of changes in PVs.

Estimating the Size of a Website through Data from an Index.

Portal sites like Amazon and Rakuten use PVs as an index to indicate the size of their website. These types of massive sites can have PV counts into the tens of billions monthly.

Since PVs are proportional to the volume of products being offered, portal sites with more content naturally have many PVs. In contrast, specialized websites (e.g., an information website related to access analysis or movement) have less content.

When publishing an ad, PVs can be used as an index to indicate the ability of a medium to attract customers (e.g., web page or website where the ad will be published).

Sessions

The more sessions you can get, the better situation you are in. When this index value is high you know that your website is popular. For an EC website, it can be said that the number of sessions is equal to the number of  prospective customers.

To increase sessions, you need to analyze inflow. Inflow to a website includes links from search engines, referrer websites (e.g., the website was introduced in a blog and users visited via the link in the blog article), ads, newsletters and bookmarks. To increase sessions, you need to check how many users visit the website from each type of inflow. If sessions are smaller than you were expecting or not likely to increase, analyzing the inflow and search phrases and conducting SEO or publishing a keyword ad are effective measures you can take to counter the problem.

Analyzing Unique Users

Unique users, page views and the number of sessions

If the number of unique users increases, that means that the number of new visitors to the website has increased. While this is a good thing, you will need to compare the number of unique users to the number of sessions and page views to see the true scope of the situation.

If the number of sessions has increased in conjunction with the increase in unique users, then you know you are getting a higher influx of visitors to your site, but if the number of PVs does not increase, you can assume that the content of the website is not holding the user’s attention and you need to change or improve your content.

Visit frequency and the number of visits from unique users


Visit frequency tells us how many days have passed since the last time a user visited a website. It can also tell us how many times the user has visited. Using the data in conjunction with the number of unique users can shed light on inflow to your website and if there are any areas that can be improved in order to increase conversions.

How to take advantage of the new unique users rate

The new unique user rate shows how many new users your website is attracting. If attracting new users is your goal, you can use these metrics to show how successful your marketing strategy is over the course of a set period.

Note that having a high new unique user rate is ideal, however if you are an EC website, you also want to increase your rate of returning users.

Google Analytics can show you the number of new users, returning visitors, sessions, repeats, PVs, and the interval of repeats.

Demographic information

Programs like Google Analytics can show you demographic information on users. It collects data from various sources, including cookies and surveys, to show the analyst ages, genders, locations and user interests. The data can help you predict user behavior, trends and ways to target specific users.

Using Conversions for web analytics

The Basics of Improving your Conversion Rate.

There are three basic things you should do if you realize that you are not getting as many conversions as you desire.

Review the customer attraction method to increase inflows

The fewer customers there are, the less conversions will happen. Consider how to increase inflows to the website. Increasing Internet ads, newsletters, SEO, and SEM are effective ways you can attract customers.

Improve areas where users tend to exit from the website

If you notice a trend where users are exiting from the website before conversion, try to pinpoint the areas where they are leaving the most and work to improve them. There may be issues with content, slow loading times or even broken links. You can check the exit and bounce rates for each page individually and also check for various search phrases and referrers to figure out what is keeping your customers from taking that final step. If the exit rate is high for a specific referrer, you might want to reevaluate just how you can attract and keep their interest.

Improve the process customers experience as they are converting

The final act of conversion is always preceded by a specific page or pages on a website that shows the customer exactly what they are going to be receiving after they make their purchase or request. The shopping cart of an EC website is a good example of this kind of page. If there is a problem with the link to leading to the shopping cart, application etc., do your best to resolve the issue or improve the page so that the user will convert.

E-Commerce

When a user converts they supply the company with product names, sales revenue, member IDs etc. You can use the collected data to track the path a user followed to get them to the conversion page, inflow and outflow.

Google Analytics provides a method called E-Commerce tracking to analyze e-commerce. In particular, data on the order completion page is fed to Google Analytics, and the daily sales amount and sales amount per product can be determined.

Average

The following explains attribution analysis.

When you conduct an attribution analysis, focus on the contribution to conversion
For this type of analysis, focus your attention on data from the web server. You will want to take a close look at inflow and inflow referrers and how they relate to sessions that, hopefully, progress into conversions. These relations and where referrals are coming from are counted as contributions to conversion.

For example, a user visited your client’s website for the first time after following a referral link in Ad A. They liked what they saw and ended up completing their visit with a conversion. The contribution inflow from this visit is due, 100%, to Ad A. A second user followed the same link in Ad A but chose not to convert that session. The second user visited the website again from Ad B on another day finally leading them to convert. In this case, Ad A and Ad B had a 50/50 contribution rate.

*Distribution ratio of the conversion value varies depending on several factors, for example, the order of visits and classification of inflows (charged or free).

Measure an ad’s “indirect effects” on customers

The concept of the indirect effect of ads has been studied in the past. Looking at the way ads can affect memory to create post-impressions and lead to post-clicks is important to knowing the full effect of your marketing campaign.

The indirect effects seen by post-clicking and post-impressions are also called click-through conversions and view-through conversions respectively. To analyze view-through conversion, distribution data on the ad server is mainly used.

You can find out the number of assists, in addition to conversions, when you use Google AdWords ads. They are called assist conversions and measuring them is useful so that you know the medium or ad that indirectly contributed to a conversion.

Session Duration

Notes on Analyzing Session Duration

When a session is terminated during a stay (i.e., no request has occurred for 30 or more minutes), a new session is established and the session duration is counted as having ended.

Some analysis tools define the session duration for a page (page session duration) from which the visitor bounced to one minute.

When a visitor bounced to another website and returned to the website before the session is disconnected, a longer session duration than the actual duration may be recorded.

Example of User Session Duration

Assume that a user browsed the website in the following sequence:

0:00:00 Home => 0:00:30 Top of category => 0:01:15 Product page A => 0:02:00 Product page B => 0:02:30 Follows link to another website

The session duration in this case is two minutes; lasting from 0:00:00 when the user accessed the home page until 0:02:00 when the user started browsing product page B.

Google Analytics uses the term average session duration on visit.

Notes on Analyzing Page Session Duration

  • Usually exited pages and bounced visits do not get added into session duration measurement calculations as they cannot be measured with 100% accuracy. Some analysis tools, however, provide a means to measure these categories. For example, by regularly checking that the user has remained on a page for a period of time by using JavaScript, the duration on the last page can be estimated.
  • If the session is terminated during a period when the user is inactive, the session duration for that page is not measured. Some access analysis tools such as Google Analytics start new sessions when a user exits the website and then returns after running a search with a search engine. If a user exits from the website, runs a search with a search engine, and then returns to the website before the session is terminated, the time spent on the new search is included in the session duration.
  • When calculating the average session duration for each page, pay attention to the denominator. Basically, divide the total value of session durations for a page during a single session by the number of sessions browsing the page.
Note that the correct session duration cannot be obtained if the session for which “the page was the exit page” is included in the calculation. This is because the session duration for the exit page is unknown. Also note that some tools add the “last exit session” to the denominator by default.

Is a Longer Session Duration Better?

The best session durations depend on the structure or purpose of the website and page.

For example, for a website or page that directs visitors depending on which search engine they come from or what they want, it is better for the users to be able to access the required information as soon as possible, therefore a shorter session duration is better.

On the other hand, a website or page that is dedicated to providing information generally assumes their visitors are reading the contents of their website thoroughly. In that case a session duration with some length is preferred.

What is “Session duration 00:00:00” in Google Analytics?
As mentioned above, the session duration cannot be calculated for the exit page, thus it is shown as “00:00:00.”

Relationships with Other Indices

 Observe the session duration per page from the viewpoint of the visitors. For a page that is providing information, Wikipedia for example, longer session durations are preferred. They like it when the information is being accepted by visitors and therefore they are being visited frequently and for fairly long durations.

A page with a long session duration and many PVs can be understood as being browsed by many people for a long time. However, if the session duration is long but the PV count is lower than expected, it can be said that “the contents are interesting but many users have not noticed this page or it does not seem to be able to hold user attention.”

In this case, check for a problem with the flow line or the induction to the page.

Analyzing Multiple Devices

You can see what device users use for access from the user agent data recorded in the logfiles. By comparing inflow or conversions per device type, you can propose a website design suitable for each device type. In Google Analytics, you can use the custom dimension to obtain user attributes on the form confirmation screen or completion screen and use this data for analysis.

*Segmentation by Google Estimates

Google Analytics shows the age, sex, and interest categories as estimated by Google. Note that this attribute is an estimate made by Google. Take the data provided with a grain of salt.

Behavioral analytics for smartphones and tablets

In recent years, with the rising popularity of smartphones and tablets, many businesses are prioritizing the improvement of mobile-friendly websites rather than their regular websites. A web analytics consultant must be aware of this trend and act in a manner that will benefit their client.

You will need to understand how mobile devices interact with PC’s and how mobile sites compare to websites. Being able to identify session durations, PVs, etc. on smartphones is important if you are going to make sure that they are fulfilling user needs and resulting in conversions.

Google has standardized smartphone support functions and other services are following in their footsteps. Other tools help analysts understand user behavior on mobile sites and how you can make sure that your site is displaying correctly on small screens.

You will also need to make sure that your site responds to the now-standard motions that control touch-screens.

Tap

A tap of the finger behaves like a click on a mouse. Double tapping is equivalent to double clicking, while pressing and holding brings up a menu or more options.

Flick

This is a left-right motion that allows you to transition between pages. You can go back to the previous page, load images in a playlist, etc.

Pinch in / out

This action allows a user to make text or an image larger or smaller as it zooms the page in or out.

Swipe

This motion, a finger slide, allows the screen to scroll up or down.
You can analyze these actions in order to improve the various elements on a mobile site.

Notes on Smart Phone Analytics

Smart phones can accept cookies. This allows analytics software to gather data and make comparisons as it does with PCs. In the example from Google Analytics below, a “Yes” for “Mobile (including tablets)” means there is inflow from smart phones and Tablet PCs.

If there is a large inflow from the mobile category, it may be necessary to setup a mobile website. If there are differences in the bounce rate or average PVs, you may need to improve said website.

If the numbers you are getting when you analyze traffic from mobile devices are low, there may be issues such as the page display lagging, characters and images being too small or tapping on links and buttons being difficult. Consider improving these issues.

Many users use smart phones with one hand and you need to consider ways to make the experience user friendly. Clicking a button can be hard if it is in a bad location or if it’s too small. If you are creating an application, make use of operation methods specific to smart phones, such as flicking, for better usability.

Note that there are differences in web Analysis between smart phones and PCs. The following discusses these differences.

Referrer

Since accesses from smart phones come via browsers, you can obtain the referrers name.

Cookies and JavaScript

Note that some smart phones accept cookies from the first-parties only and do not accept those from third-parties by default.

Model name, OS and carrier name

The model name as well as the user agent can be obtained from smart phones, but there are cases where determining the carrier is difficult. If a model is released only for a specific carrier, the carrier can be determined. However, for models supplied by multiple carriers (e.g., the iPhone), you cannot determine the carrier.

Unique user identification

For some smartphone OS types, you may be able to identify unique users from the terminal ID number and the number of the SIM card. Currently, browsers cookies are mainly used for unique user identification, similarly to PCs.

Identification of unique users by line connection

Smart phones access the Internet using the cellular network. For connection via LTE, 3G or 4G accesses are made via gateways and the identification of unique users from IP addresses is difficult. If a user is utilizing a specific Wi-Fi connection it is possible and the ISP or organization name may be identified in this case, as with PCs.

Notes on Smart Phone Analytics

Smart phone users often get referrals to websites from apps or non-official websites. With apps like Yelp, a user can search for non-specific places and then they are able to read reviews and follow a link to the main webpage. These apps often show search results based on location rather than popularity. They also allow for PPC advertising through companies like Admob, immobi, and Admaker. There are also affiliate ads called “reward ads” which reward visitors for clicking or watching a video.

 Web Analytics for the Smart Phone

Refer to the user analysis mentioned above when considering the user friendliness of your mobile website. Since the users mainly browse websites from smart phones while in transit, you can narrow their purposes for browsing. They will use computers to collect more in-depth information at a later time.

Notes on Deviations

Total Amount of Unique Users and Sessions

The sum of daily (weekly) unique users is not necessarily the number of weekly (monthly) unique users

When aggregating unique users, note that the sum of daily (weekly) unique users is not necessarily the number of weekly (monthly) unique users. This is because the number of unique users is not the total number of visitors and multiple visits by the same user are counted as one unique user. If the same user visits a website on different days, they are counted as one unique user on each day. They are also counted as just one unique user for the entire week or entire month. 

The sum of daily (weekly) sessions is not necessarily the number of weekly (monthly) sessions
Similarly, when aggregating sessions, the sum of daily (weekly) sessions is not necessarily the number of weekly (monthly) sessions. For a session across two days, it can be seen as a single session for each day, but also as a single session for a couple of days.

*Some tools separate the session across two days at 00:00 and count it as two sessions.

Errors Due to the Number of Days in a Month

Notes on comparing PVs and other data with other months

When you compare PVs or sessions of one month with those of another month, pay attention to the number of days as well as the number of weekdays in each month.

Since these numbers differ for each month, simply comparing the monthly data will produce a difference larger than the actual situation. By using the average value obtained by dividing the number of days, accurate analysis will be ensured.

For example, there is a difference of three days between January and February. If the daily PVs are ten for both months, the monthly total PVs are 310 for January and 280 for February, resulting in the illusion that PVs decreased in February. When comparing data of a month with that of another month, be sure not to simply compare monthly data.

Also, months with fewer weekdays (e.g., the end of the year and months with many holidays) tend to show decreased PVs. Therefore, be sure to consider the number of weekdays.

When comparing data with other months, you should compare data not only with the previous month but also with the same month of the previous year. By comparing with the same month of the previous year, comparison eliminating seasonal factors (e.g., summer vacation and Christmas holidays) can be done.

If the types of visitors include both B2B and B2C, analyzing data by separating time periods into “daytime on weekdays” and “nighttime and holidays” may produce more accurate analysis result.

In particular, B2B users access more during daytime while B2C users access more during nighttime and on holidays. Separating the time period in this way allows you to understand the trends of each user group more easily performing a web analytics KPI dashboard analysis, you may by analyzing the overall trends from your website. These trends include PVs and sessions from the entire website. You have to make sure that you understand the data you are gathering when conducting web Analysis. There have been issues in the past when people outlining a city’s PVs ended up making a mistake in the number of search engine results they had received.

Screen resolution, browsers, and operating systems

What You Can find out from Screen Resolution Data

You can obtain screen resolution data from access analysis data. The screen resolution is the resolution of the display monitor (e.g., PC monitor) and is generally shown as a number like “1280 x 1024.”

You may think that obtaining screen resolution data is not useful for improving a website, but it is. From the screen resolution data, you can see what screen size the majority of visitors view the website with, if they view it in landscape or portrait and reflect this information on the design of the website with the goal of improving the visitor experience. It is not necessary to check screen resolution data every day but it is important to know what settings are the most popular so you can cater to the majority of your viewer base and build a website that will load quickly and be user friendly.

What You Can Discover From OS and Browser Data

You can obtain the types and versions of the user’s OS and browser from the access log.

The OS is the operating system running on a PC, such as “Windows 8” or “Mac OS.” A browser is the software program the user uses to access the Internet, such as “Internet Explorer” or “Firefox.”

Information regarding the types and versions of a user’s OS and browser can be used as a reference when building a website. For example, since the displayed design differs a little depending on the type and versions of the OS and browser, you can determine which OS and browser should be used to check if your design layout appears consistent over all systems. Mobile versions of websites should be checked as well as they will differ from layouts that work on PC’s or tablets.

Browsers and OS’s

As it is almost impossible to build a website that conforms to all of the screen sizes, OS’s, and browsers available; check with your client to see which types of browsers they want to work with. This will save you and your clients both time and money.

Characteristics and Deviation in Web Analytics Data


Power Distribution and Average

Long Tail Data

Since web analytics data largely fluctuates, long tail data is common.

Long tail data means that when data values are arranged in the descending order, the foot seems like the long tail of a dinosaur. For example, if you arrange search phrases used in search engines in the descending order of PVs, there will be enormous number of search phrases each of which has only one PV, forming a long tail.

One characteristic of such data is that mean and variance do not make sense.

While many of the data items used by web analytics tools show means, you should be careful to see if the mean really represents the population.

In particular, be careful with the session duration. An average session duration of visitors does not likely link to actual user movement or lead to a result.

Mean and Median
An index “mean” is frequently used to analyze various data. A mean is the value calculated by dividing a total value by the number of data values. When any of the values is extremely different from others, the mean may be a value far from the actual condition.

For example, if 19 visitors browsed a page for 20 seconds and one visitor browsed the same page for 300 seconds, the session duration for this user is extremely different from the others. The mean for the session duration in this case is (19*20+300*1)/20 = 34 seconds. This is far from the actual condition.

For such a situation, you can use the “median.” The median is the value at the center when data values are arranged in descending order. If there are 21 values, the 11th value is the median. If there are 20 values, the 10.5th value is the median. Since the “10.5th value” does not exist, it is actually the mean of the 10th and 11th values. For the example above, the median is “20 seconds” and is close to the actual condition.

Reasons of Deviation

Data output from a tool differs depending on the purpose.

A report of an ad effect measurement tool is intended to show the result brought by an ad, and that of an access analysis tool is intended to show the result brought by website traffic analysis. They use different result indices and there are differences in the data.

For example, conversion values are different between a listing ad and access analysis data. While a listing ad considers a conversion a result even if it is made by a user several days after he/she clicked the ad, access analysis does not consider a conversion as a result unless it is made during the session from inflow from the ad in many cases.

Difference Due to Cached Data

When measuring PVs, there are differences in values depending on the tool used. This is caused by how the tool handles cached data.

If the tool uses the server log method, PV is counted each time a file is accessed. When cached data is read, no file access occurs and PV is not counted.

If the tool uses the web beacon method, PV is counted each time the beacon tag is read. Even when cached data is read, PV is counted as far as the beacon tag is read.

Difference Due to Data Acquisition Timing

Access analysis tools show different values depending on how log data is acquired. While a tool using the web beacon method counts PV when a page is displayed in the browser, a tool using the server log method counts PV when a request for the page is made to the server. Therefore, if the user requests a page to the server and then closes the browser before the page is displayed, PV is counted by a tool using the server log method and not counted by a tool using the web beacon method. This results in more PVs with tools using the server log method than those using the web beacon method.

Difference Due to Tag Location and Specifications (of Analysis Tools and User Environment)

With the same access analysis or ad effect measurement tool, different data can be obtained depending on the tag location and/or specifications. If you place tags near the top of the HTML file and near its end, different data is returned. This is because if the user closes the browser before a web page is displayed completely, tags near the top are recognized while tags near the end are not.

In addition, different analysis tools connect sessions and judge that a session has been terminated differently. Generally, a session is terminated in 30 minutes with no operation. But some tools allow the user to change this period. Some tools use referrers or requests to judge if a session is continued.

Also, data obtained depends on the settings of the browser and/or security software. If a browser is set to accept cookies from a first party but not from a third party, only access analysis tools which use first-party cookies can obtain data.

Addressing Differences in Data

Cases where values of a single index (e.g., PVs) can be different depending on tools have been mentioned so far.

If this difference is large, you need to check how to set the tools and their specifications to find out the cause of difference. If you consider that data from one tool conforms to the actual condition more than other tools, clarify the basis of your judgment in a report or explanation. If you cannot judge which data is most realistic, show all of the data to the participants and think about a solution together.

If you cannot find the cause of the difference, choose data which seems to be more accurate, and regularly check the other data. If the difference is not large, choose the one by yourself or by discussing it with the participants, and clarify the reason why you chose it.

What is web analytics? Web analytics is an excellent method to analyze user behaviors in real time, however, there are differences due to which tools are used. Troubleshooting the cause of differences too much will not lead to results, an often turn out to be just a waste of time. Do not forget that you are conducting web analytics in order to bring business results, and you should focus on the comments and activities which lead to the results.

COMMENTS

Name

Digital Leadership Marketing Startup Strategy
false
ltr
item
Thought Leadership Zen: Deep diving into website traffic analysis
Deep diving into website traffic analysis
https://2.bp.blogspot.com/-6R6JbkVPaQI/WBiePPv7PvI/AAAAAAAAAZ0/zUAoJ_J7kbwoBVgnvA8aKzlwvWDgSQHggCLcB/s400/website%2Btraffic%2Banalysis.JPG
https://2.bp.blogspot.com/-6R6JbkVPaQI/WBiePPv7PvI/AAAAAAAAAZ0/zUAoJ_J7kbwoBVgnvA8aKzlwvWDgSQHggCLcB/s72-c/website%2Btraffic%2Banalysis.JPG
Thought Leadership Zen
https://thoughtleadershipzen.blogspot.com/2016/11/website-traffic-analysis.html
https://thoughtleadershipzen.blogspot.com/
https://thoughtleadershipzen.blogspot.com/
https://thoughtleadershipzen.blogspot.com/2016/11/website-traffic-analysis.html
true
5037980482432684173
UTF-8
Not found any posts VIEW ALL Readmore Reply Cancel reply Delete By Home PAGES POSTS View All RECOMMENDED FOR YOU LABEL ARCHIVE SEARCH ALL POSTS Not found any post match with your request Back Home Sunday Monday Tuesday Wednesday Thursday Friday Saturday Sun Mon Tue Wed Thu Fri Sat January February March April May June July August September October November December Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec just now 1 minute ago $$1$$ minutes ago 1 hour ago $$1$$ hours ago Yesterday $$1$$ days ago $$1$$ weeks ago more than 5 weeks ago Followers Follow THIS CONTENT IS PREMIUM Please share to unlock Copy All Code Select All Code All codes were copied to your clipboard Can not copy the codes / texts, please press [CTRL]+[C] (or CMD+C with Mac) to copy