Google BigQuery and Google Data Studio Reports
As you already know from our previous blog post, Daktela offers a wide range of reporting tools. To make things simple, you can think of Daktela as having 2 branches of Reporting and Analytical tools:
- Statistics and Analytics available in the Daktela web app
- Google solution
While Daktela is a complex web application and contains robust analytical tools out of the box, in some cases it may be useful to opt for an external service. You may want to think about using one particularly if you:
- Have large amounts of data
- Use several PBXs
- Need to join tables
- Want advanced report data filtering
- Are worried about PBX loads
- Require faster loading times
The modern and super-powerful Google solution is an ideal candidate for these purposes. We will have a quick look at it in this article.
In general, the solution is a combination of data storage (Google BigQuery), reporting tool (Google Data Studio), affiliate services (Google Cloud Platform – security, management, maintenance, monitoring) and data flows (Daktela), making up an ETL and BI portfolio.
A clear advantage is that the solution is accessible using a web browser. No dedicated software is needed; everything is available from any PC using a Google account with sufficient rights. System operability is dependent only on your internet connection speed.
Daktela – automatic synchronisation
There are periodic tasks set up on bigquery.daktela.com that transform the data from PBXs and load it onto BigQuery (BQ) in regular intervals. These so-called run scripts are written in Java.
The synchronisation interval is 120 minutes and 24 hours. Data is loaded periodically every 2 hours and at the end of each day at midnight, another consolidation batch is loaded to ensure data integrity. On selected PBXs, it is possible to shorten the 2-hour interval to 1 hour.
In isolated cases, seemingly duplicate entries may be created in BQ. The run script looks for duplicates, its output is monitored and notifications are sent to monitoring software. Duplicates are automatically removed during the daily sync. This means that you don’t need to do anything extra to keep your reporting data up to date.
Since the run scripts of all projects connected to BQ are running on the bigquery.daktela.com server, they are scheduled to run at different times so they don’t compete against each other and so system resources are used effectively. This also contributes to a relatively short processing time. A synchronisation step (loading an hourly or daily batch for all project instances) is a matter of minutes to tens of minutes – typically 10-30 mins, up to 60-90 mins in extreme cases. When debugging, synchronisation can also be started manually.
All scripts output a text file log.
Access to the bigquery.daktela.com server is secured using an SSH key. The server is managed entirely by Daktela IT administrators who also manage user access.
BigQuery is a fully-managed, serverless data warehouse (DWH) that enables scalable analysis over extremely large amounts of data (in the order of petabytes: 1PB ≈ 1024 TB ≈ 1 million GB). It is a Platform as a Service that supports querying using ANSI SQL.
Google also provides affiliate functionality such as machine learning, geolocation tools, analytics and statistics tools and user access and rights management.
BigQuery architecture is entirely managed by Google, meaning users don’t need to worry about things like scalability, computing power distribution, backing up, maintenance etc. The solution is unique because its design lets you receive results of requests over terabytes of data in seconds and petabytes in minutes.
This allows you to load your reports and interactively filter and sort your data in real time.
BigQuery is not just a database – it is a non-relational online analytical processing database system optimised for reading and analysis. It is also possible to insert data and to connect BQ to external data sources.
Its key properties are scalability, reliability, availability, low running costs, self-repair possibilities and decentralised architecture. DWH alternatives to BQ are e.g. Amazon Redshift and Microsoft Azure SQL.
Data in BQ is organised into data sets (similar to a database) made up of tables, views and other entities such as machine learning models. Rights to access, edit, delete etc. can be set for the entire dataset or for the individual tables that are a part of it.
The Google Cloud platform is used to manage user accounts and users must have an active Google account to work with BQ. All accesses are managed and checked by Daktela administrators.
Quotas and limits
Google limits maximum system load by setting up limits and quotas for a given time period (e.g. number of queries, size of transferred data etc.) BQ is made for large projects with vast amounts of data and from a big data perspective, even the biggest and most complex Daktela PBXs are a small and simple data set.
Google Data Studio
Google Data Studio (DS) is a data visualisation and reporting tool similar to Microsoft Power BI or Tableau. It is a web application with a user-friendly interface where you can create reports by adding tables and graphs and complement them with text and visual elements.
Data Studio can be used for:
- Creating and editing individual reports
- Managing reports that the user has access to
- Viewing reports
- Creating, editing and managing data sources
- Managing access to these reports and data sources
Each report is made up of:
- A data source
- Access settings
Multiple reports can use the same data source and a report can use several different data sources.
Access rights are granted in the Google environment separately for each service:
- BigQuery accesses and rights can be set up for each:
- Project (internal Daktela accesses)
- Data Studio accesses and rights can be set up for each:
- Shared data source
In both BQ and DS, the user sees only the entities that they have access rights to. They have no way of finding out about other users’ entities (data sets, tables, data sources, reports) until they try to access them via a URL.
DataStudio – data sources
The term Data source collectively refers to:
- Data source connection set up
- Transformation and formatting of received data
- Data synchronisation frequency
- Access rights
All of these items make up a Google DS entity that the user can work with. The entity is not a data source but rather a data model or scheme. You can also think of it as a connector to the data source, typically a DWH.
Google Data studio can use the connection to the data source to connect to different DWH kinds, from Google native ones (Google BQ, Google Sheets, Google Analytics, Google Ads, Google Surveys, Google Cloud etc.) to commonly used ones (Microsoft SQL Server, MySQL, PostgreSQL, CSV files etc.) to third-party connectors. A full list of connectors is available on the Google website.
By connecting a data source, a new entity is created. It defines:
- Connection to BigQuery or a data origin
- Queried table or SQL specification
- Dimension formatting
- Aggregated metrics
- Parameters – typically a time period for data
- Access – which users can edit the data source and/or edit its definition
- Data synchronisation frequency (“data freshness”) – typically 1 hour, ranging from 1 minute to 12 hours
- Data source sharing – limited to current report or allowed sharing in multiple reports
There are other ways to narrow down data in a report other than directly in the data source. You can also use “controls” and “hidden filters” that are described below. It is also possible to connect individual data sources (“data blending”), providing complex data aggregation that would otherwise be difficult to achieve for users without good SQL knowledge.
Every data request from a report is translated into an SQL query. Its result is temporarily saved (cached) for the amount of time set up in “data freshness”, delivering almost immediate response times and low costs per query.
The report owner automatically becomes the owner of the data source that is created in the report. It is possible to add shared data sources from your own reports to other reports that you own. To use and edit data sources owned by other users, you need to request the owner’s permission and they must grant it.
If you remove an embedded data source from a report, it will be deleted. If you remove a shared data source from a report, it will not be deleted.
DataStudio – reports
After you set up your data source, you can create a report by adding graphs where you can select dimension and/or metrics from the data source. To make the layout as clear as possible, you can add multiple report pages and add other elements such as text and pictures.
A field is a data column. There are 2 different types: dimensions and metrics. Dimensions contain categories or entities that you want to measure. Metrics describe dimensions using a number.
Apart from fields automatically loaded from the data source, you can create your own field by:
- Combining dimensions to create a new dimension
- Combining metrics with each other or with a dimension to create a new metric
A dimension is a category of information or data, or more precisely a data array. E.g. when measuring website traffic, some dimensions could be Browser, Web page, Language etc. If you are using a database-based or table-based data source, a dimension represents a column of your table.
In DS, you can change the title, data type (text, number, date etc.) and description of each dimension.
A metric is an aggregate calculation or operation applied to a data set, e.g. number of entries, sum, maximum, minimum, average etc. It is analogous to an aggregate function in an SQL query.
You can combine dimensions and metrics in your graphs and tables.
The following are available:
- Tables (regular and pivot)
- Scorecards – elements with a single metric
Filtering in DS is interpreted in the background as editing the data source SQL. Consequently, you will receive the same data whether you apply a filter when data enters DS (definition of the SQL query) or on the report element level (a graph’s or table’s hidden filter), or if the person viewing the report applies the filter (user filter = control element). Filters will always be translated into SQL queries in BQ.
However, you can achieve faster loading times by applying filters correctly. It is recommended to define a specific SQL query for the entire data source and receive only relevant data rather than requesting the entire table and filtering on the report level. This also saves time when loading and managing available components during the report editing process.
Other elements you can add to your reports:
- External content – embedded URL
- Parameters (time period, visitor e-mail address, own parameters)
Since the reports are at a specific URL, it is possible to embed reports into your online documents, web pages etc.
DataStudio – accesses
There are two different types of accesses:
Users with View access cannot make any changes to the layout or settings of the report. They can only interact with the report and usually export it to PDF, print it or create a copy (which they can then edit, i.e. also find out how it was created).
Users with Edit access can edit report content and change its settings. The report owner can manage other access properties such as preventing other users from adding more users.
You can also allow access using a link. This would change the above to:
- Link sharing off – only the specified users will have access
- Link sharing on – the specified users have access, plus:
- 2.1. Anyone in a group with the link can view (e.g. only users in your company)
- 2.2. Anyone in a group with the link can edit
- 2.3. Anyone with the link can view
- 2.4. Anyone with the link can edit
- 2.5. Anyone on the internet can find the report and view it (even without the link)
- 2.6. Anyone on the internet can find the report and edit it (even without the link)
If you want to avoid having to set up different levels of access to your reports, you can set it to 2.3 or 2.5, but we don’t recommend this if your reports contain any sensitive information. It is not possible to view specific report traffic – Google Analytics will only display anonymised information.
Access to source data is described above.
To access the current and interactive version of a report, a user needs to know either:
- Its URL, or
- The exact title of the report and its owner. Optionally, a PDF report can contain a link to the online or embedded version of the report.
If the user knows the report URL and has access to the report, the browser will display it. If they don’t have access, a dialog box will open, informing them that they need to ask the owner for access. The user can select what type of access they want to request. The owner will receive a notification and once they grant access, the user will be able to open the report. If the user doesn’t know the report URL, they need to contact the report owner another way. The owner then needs to manually find the report and grant the user access to it.
Reports in PDF can be password protected, but they can be shared like any other file type. To prevent unauthorised sharing, the report owner can prevent other users from exporting the report to PDF. This also turns off Printing and Copying.
Finally, you can also use distributed report sharing using periodically sent automatically generated emails that contain:
- a link to the up-to-date version of the report
- a report preview (usually the first few pages)
- an attachment with a PDF copy of the report with data from the day the report is sent
End recipients can forward the email to others. If any users don’t have access to the report, when they open the report URL, they will be able to request access. They will still be able to open the PDF report.
This method can however be useful if you need to distribute the report to people outside your organisation.