Google Sheets has a built-in function called ImportXML which can be used to scrape publicly available structured data from webpages. ImportXML imports data from any of various structured data types including XML, HTML, CSV, TSV, and RSS and ATOM XML feeds.
In this post we’ll show you how to use the ImportXML function to aggregate Pinterest followers from different Pinterest accounts into a Google Sheet, and then visualize that information using Geckoboard’s Spreadsheets data source.
Syntax of the ImportXML function
The syntax of the ImportXML function is:
URL: The URL of the page to examine, including protocol (e.g. http://).
The value for URL must either be enclosed in quotation marks
“” or be a reference to a cell containing the appropriate text.
XPath query: XPath is an expression language used to specify parts of an XML document. Here you’d enter the XPath query to run on the data given at the URL. The value for Xpath query must be enclosed in quotation marks
“” Each result from the XPath query is placed in its own row of the spreadsheet.
Common XPath queries include:
//h1– Scrapes all h1s
//title– Scrapes all titles
//meta[@name='description']/@content– Scrapes the meta description
//@href– Scrapes all links
//link[@rel=’canonical’]/@href– Scrapes any canonicals
//*[@itemtype]/@itemtype– Scrapes types of schema
//*[@hreflang]– Scrapes hreflang
Tip: Google Chrome includes the ability to copy Xpaths as part of its DevTools feature. To access it, inspect the element you’re interested in by right-clicking it and selecting Inspect. In the elements panel, right-click the element you’ve highlighted and Copy > Copy XPath.
Example: Importing Twitter follower data into a Google Sheet
Let’s use a new tab in Chrome to ImportXML to display Geckoboard’s Pinterest follower count in a Google Sheet:
- In Chrome, navigate to Geckoboard’s Pinterest profile: https://www.pinterest.co.uk/geckoboard/.
- From the menu, select View > Developer > View Source.
- We’re searching for the Follower count in amongst all the code, so click inside the code that appears and press command + F (or Control + F) and search for "followers". You’ll see that inside a
<meta property>tag is the value
"pinterestapp:followers". This property is the unique identifier for the number of Pinterest followers Geckoboard has.
- To pull out this data we’ll use the XPath query
//meta[@property='pinterestapp:followers']/@content. This finds all the
<meta property>tags in the source code and imports any with a
pinterestapp:followersvalue. In our example this value is only referenced once in the source, so it will only import one result into our Google Sheet.
- Switch to Google Sheets and create a new sheet.
- In cell A1, enter “Pinterest” to explain what data will be found in the adjacent cells.
- In cell B1, we’ll type our ImportXML function. This is made up of the URL of the page to examine,
https://www.pinterest.com/geckoboard/, and the XPath query that we want to run on that page
The full function should look like this:
- Repeat the above step for all the Pinterest accounts you want to aggregate in the same Google Sheet.
Through some tweaking of this basic format you can take your ImportXML data further by scraping publicly available information from other sites. For example…
The latest submissions on ProductHunt
Current Air Quality Index for San Francisco
Building a dashboard using ImportXML
Once you’ve got data into a Google sheet using the ImportXML function above, creating a live dashboard that visualizes it is straightforward using Geckoboard, our easy-to-use dashboard software.
To create a dashboard like the one below, sign up to Geckoboard using this link, and select the Google Sheets data source to authorize Geckoboard to access the data in your spreadsheet.
From there, select the data you want to visualize from your Google Sheet, and select a visualization. You can do this several times using different data in your Google Sheet to build out a fully-fledged dashboard, or pull data directly from other places like Google Analytics using the other data sources that are available.