Blog Post

High Frequency Data for investors and policymakers

What’s at stake “The sexiest job in the next 10 years will be statistician” – says Hal Varian– “and I’m not kidding.” With increased connectivity, Internet usage and data availability a new world of statistical analysis has indeed opened up. Whether it is very high frequency data, or the vast amount of them (big data), […]

By: , and Date: June 23, 2011 Topic: Banking and capital markets

What’s at stake

“The sexiest job in the next 10 years will be statistician” – says Hal Varian– “and I’m not kidding.” With increased connectivity, Internet usage and data availability a new world of statistical analysis has indeed opened up. Whether it is very high frequency data, or the vast amount of them (big data), there is a growing appetite for forecasting (or nowcasting) social and economic behaviors live. The possibilities offered are quasi infinite and truly in their infancy but a few firms and research groups are paving the way for a work that could be extremely relevant to investors, policymakers and researchers alike.

Making policy in the dark light

In an article for the New Yorker, James Surowiecki explains how essential to the public as well as to economic policymakers this kind of granular data is. In the early years of the Great Depression, the government had few good figures to go on economic activity or unemployment. As a result, policymakers persistently underestimated the severity of the crisis. In June of 1930, relying on some anecdotal evidence of an upturn, Herbert Hoover even announced that “the Depression was over”. Today, our picture of the economy is more detailed and sophisticated than ever, and that makes it easier for businesses and the government to react quickly to changes in the economy. And yet our picture of what’s going on is far from perfect. The government continues to track inflation, for instance, by gathering price data much as it did in the nineteen-fifties: it surveys consumers by phone to see where they buy, surveys businesses to see how much they charge, checks out shopping malls to price goods.

Harvard professor Gary King
describes the contours and implications ofthe “Social Science Data Revolution” characterized by the transition from the production and availability of a relatively small volume of analog data collected through surveys and official channels to those of massive amounts of digital data flowing from various streams (mobile-phone logs, EZ pass transponders, IP addresses, real estate purchases, credit card transactions, electronic medical records, online news and searches, satellite imagery, and “social everything” — blogs, comments, networking sites, etc.).

In a recent paper, Dirk Helbing and Stefano Balietti discussed the power of massive mining of high-frequency socio-economic data — or “reality mining” — and its application to forecasting socio-economic crisis with the help of new analytical tools and approaches, including pattern recognition algorithms and machine learning approaches to study complex social systems.

McKinsey Global Institute
– a shop recently under heavy criticisms to say the least (see here or here) – also published an interesting report last month that examines the state of digital data. The use of data has long been part of the impact of information and communication technology, but the scale and scope of changes that big data are bringing about is today at an inflection point as a number of trends converge.

Measuring inflation on a daily basis

Alberto Cavallo, the founder of InflacionVerdadera a website that provides daily inflation statistics for Argentina and Roberto Rigobon from the MIT Sloan School have founded the Billion prices project of the MIT. It is a far-reaching academic initiative that uses prices collected from hundreds of online retailers on a daily basis to conduct economic research. This high frequency item level data is an extremely powerful tool to study pricing behaviors, inflation, asset prices, and pass through. In the U.S., it collects more than half a million prices daily — five times the number that the government looks at. After Lehman Brothers went under, in September 2008, the project’s data showed that businesses started cutting prices almost immediately, which suggested that demand had collapsed, while the government’s numbers only started to show this deflationary pressure in November.

Hal Varian
presented back in October 2010 the Google Price index at a business economist conference. Varian emphasized that the GPI is not a direct replacement for the CPI because the mix of goods that are sold on the web is different to the mix in the wider economy. Housing accounts for about 40 percent of the US CPI, for example, but only 18 percent of the GPI. The GPI shows a “pretty good correlation” with the CPI for goods such as cameras and watches that are often sold on the web, but less so for others, such as car parts, that are infrequently traded online.

High frequency data for economic analysis

Choi Hyonyoung and Hal Varian argue in a 2009 paper that by pooling searches in categories, Google Trends data can help predict initial claims in initial claims for unemployment benefits in the United States. Askitas and Zimmerman (2009), Tanya Suhoy (2009) and Franceso D’Amuri and Juri Marucci have examined similar unemployment data for Germany, Israel, and Italy respectively, and also found significant improvements in forecasting accuracy by using Google Trends.

Nick McLaren and Rachana Shanbhogue
from the Bank of England are amongst the few researchers in Central Banks openly interested in the possibilities offered by online searches (in particular by the function) and its predictive power for housing markets and unemployment. Initial results suggest that Internet search data can help predict changes in unemployment in the United Kingdom. For house prices, the results are stronger: search term variables outperform existing indicators over the period since 2004. There is also evidence that these data may be used to provide additional insight on a wider range of issues which traditional business surveys might not cover.

Tanya Suhoy
from the Bank of Israel warns of important shortcomings: the dynamics of query indices may be non-stationary and, there is perhaps a problem of varying predictive ability of query indices as agents use alternative social searches which cannot be tracked by Google.

Google has come up with a new tool, Google Correlate, which finds correlations between whatever data you want to plug into it with whatever people are searching for on Google. Real Time Economics gave it try and the results are somewhat surprising. The Fed’s balance sheet has, for example, an extremely high 0.9605 with searches for “nausea remedies.” Maybe quantitative easing hasn’t been good for America, but it’s been good for Dramamine sales! (Stats geeks, we hear your complaint: “Correlation doesn’t imply causation.” Did you know that there’s a nearly perfect correlation between people who say that and people who are party poopers?). But wait — the correlation between the size of the balance sheet and searches for “how to get over a guy” is even higher — 0.9726.

Fighting influenzas and Quantifying human movements

Jeremy Ginsberg and his coauthors have studied influenza epidemics for which early prevention is key to saving lives. They conclude that health-seeking behavior in the form of online queries to search engines, which are submitted by millions of users around the world each day are useful guide to health concerns. We can accurately estimate the current level of weekly influenza activity in each region of the United States, with a reporting lag of about one day. This approach may make it possible to use search queries to detect influenza epidemics in areas with a large population of web search users.The Google Flu Trends approach has now been extended to the detection of dengue outbreaks via the eponymous Google Dengue Trends.

A team of Boston-based network physicists found that they were able to predict an individual’s whereabouts with over 93%accuracy based on cell phone generated information on past movements.Nathan Eagle notes that mobile phones give researchers the ability to quantify human movement on a scale that wasn’t possible before.

Detecting digital smoke signals in developing countries

A particularly interesting avenue is the application of these tools and approaches in developing countries. According to 
Gavin Krugel, director of mobile banking strategy at GSM Association, an industry group of 800 wireless operators, one billion consumers in the world have a mobile phone but no access to a bank account. About 40 million people worldwide use mobile money, and the industry is growing with 18,000 new mobile banking users per day in Uganda, 15,000 in Tanzania and 11,000 in Kenya. Mobile phones are routinely used for banking services, including payment and money transfers, saving services, but also for the transmission of other pieces of information such as grades and other test results, stocks and prices of various goods in different markets, medical appointments, etc. A similar phenomenon is also visible and expected to persist for Internet traffic: according to a study, while Internet traffic growth is expected to be in the realm of 25-35% in North America, Western Europe and Europe, it may reach or surpass 50% in Latin America and the Middle East and Africa.

Mobile phones are also used to purposefully generate high-frequency data in crisis settings through a technique known as “crowdsourcing” or “crowdvoicing”. The technique received widespread attention in the aftermath of the 2010 earthquake in Haiti with the work of Ushahidi, a non-profit tech company that set up a text messaging system allowingcell-phones owners to report on collapsed buildings in real-time. Indeed, a high correlation was found between reports of damaged buildings via text-messages and actual damages on the ground, which prompted Ushahidis’ Patrick Meier to conclude that data collected using unbounded crowdsourcing (non-representative sampling) largely in the form of SMS from the disaster affected population in Port-au-Prince can predict, with surprisingly high accuracy and statistical significance, the location and extent of structural damage post-earthquake.

Global Pulse
, a United Nations initiative set up in 2009 to detect “digital smoke signals” in new high frequency data that may be signs of incipient harm at the community level. The rationale behind the project is that individuals and communities in developing countries change their collective behaviors in response to shocks in ways that leave trails in digital data such as mobile-phone banking transactions, access to programs and services, fertilizer use, etc.

Conceptual, analytical and operational challenges

Analyzing high-frequency data for research and policy purposes also run into a number of challenges — conceptual, analytical and operational. Gary King, in another contribution Ensuring the Data-Rich Future of Social Sciences discussed the need to find a balance between the unprecedented increases in data production and availability about individuals and the privacy rights of human beings worldwide. In their paper, Dirk Helbing and Stefano Balietti also call for “privacy-preserving analyses”, including through the use of adequately anomymized data, deliberate participation and strong privacy-preserving technical systems and legal standards.

The complex and chaotic nature of high frequency data brings about specific analytical challenges for both researchers and policymakers. Jim Fruchterman discussed the potential problems with using text message-based systems in crisis contexts. The correlation found in Haiti is an example of a "confounding factor". A correlation was found between building damage and SMS streams but – as pointed by Kristian Lumonly because both were correlated with the simple existence of buildings. Thus the correlation between the SMS feed and the building damage is an artifact or spurious correlation.In addition,Fruchterman reported thatonce you control for the presence of any buildings (damaged or undamaged), the text message stream seems to have a weak negative correlation with the presence of damaged buildings. That is, the presence of text messages suggests there are fewer (not more) damaged buildings in a particular area.

Justin Ortiz and a teamof medical experts compared data from Google Flu Trends from 2003 to 2008 with data from two distinct surveillance networks and found that Google Flu Trends did a very good job at predicting nonspecific respiratory illnesses — bad colds and other infections, like SARS, that seemedlike the flu, but did not predict the flu itself very well. The mismatch stemmed from the fact that infections can cause symptoms that resemble those of influenza while influenza is not always associated with influenza-like symptoms. According to him, up to 40 percent of people with pandemic flu did not have "influenza-like illness" because they did not have a fever. Influenza-like illness is neither sensitive nor specific for influenza virus activity — it’s a good proxy, it’s a very useful public-health surveillance system, but it is not as accurate as actual nationwide specimens positive for influenza virus.

Bruegel Economic Blogs Review is an information service that surveys external blogs. It does not survey Bruegel’s own publications, nor does it include comments by Bruegel authors.

Republishing and referencing

Bruegel considers itself a public good and takes no institutional standpoint. Anyone is free to republish and/or quote this post without prior consent. Please provide a full reference, clearly stating Bruegel and the relevant author as the source, and include a prominent hyperlink to the original post.

Read article More on this topic More by this author

Blog Post

It’s hard to live in the city: Berlin’s rent freeze and the economics of rent control

A proposal in Berlin to ban increases in rent for the next five years sparked intense debate in Germany. Similar policies to the Mietendeckel are currently being discussed in London and NYC. All three proposals reflect and raise similar concerns – the increase in per-capita incomes is not keeping pace with increases in rents, but will a cap do more harm than good? We review recent views on the matter.

By: Inês Goncalves Raposo Topic: Macroeconomic policy Date: July 8, 2019
Read article More on this topic

Blog Post

The breakdown of the covered interest rate parity condition

A textbook condition of international finance breaks down. Economic research identifies the interplay between divergent monetary policies and new financial regulation as the source of the puzzle, and generates concerns about unintended consequences for financing conditions and financial stability.

By: Konstantinos Efstathiou and Bruegel Topic: Banking and capital markets Date: July 1, 2019
Read article More on this topic

Blog Post

The June Eurogroup meeting: Reflections on BICC

The Eurogroup met on June 13th to discuss the deepening of the economic and monetary union (EMU) and prepare the discussions for the Euro Summit. From the meeting came two main deliverables: an agreement over a budgetary instrument for competitiveness and convergence and the reform of the European Stability Mechanism (ESM) treaty texts. We review economists’ first impressions.

By: Bruegel and Inês Goncalves Raposo Topic: Macroeconomic policy Date: June 24, 2019
Read article More on this topic

Blog Post

The campaign against ‘nonsense’ output gaps

A campaign against “nonsense” consensus output gaps has been launched on social media. It has triggered responses focusing on the implications of output gaps for fiscal policy under EU rules, especially for Italy. But the debate about the reliability of output-gap estimates is more wide-ranging.

By: Konstantinos Efstathiou and Bruegel Topic: Macroeconomic policy Date: June 17, 2019
Read article More on this topic

Blog Post

The inverted yield curve

Longer-term yields falling below shorter-term yields have historically preceded recessions. Last week, the US 10-year yield was 21 basis points below the 3-month yield, a feat last seen during the summer of 2007. Is the current yield curve a trustworthy barometer for future growth?

By: Inês Goncalves Raposo and Bruegel Topic: Global economy and trade Date: June 11, 2019
Read article More on this topic

Blog Post

The 'seven' ceiling: China's yuan in trade talks

Investors and the public have been looking at the renminbi with caution after the Trump administration threatened to increase duties on countries that intervene in the markets to devalue/undervalue their currency relative to the dollar. The fear is that China could weaponise its currency following the further increase in tariffs imposed by the United States in early May. What is the likelihood of this happening and what would be the consequences for the existing tensions with the United States, as well as for the global economy?

By: Inês Goncalves Raposo and Bruegel Topic: Global economy and trade Date: June 3, 2019
Read article More on this topic

Blog Post

The next ECB president

On May 28th, EU heads of state and government will start the nomination process for the next ECB president. Leaving names of possible candidates aside, this review tries to isolate the arguments about what qualifications the new president should have and what challenges he or she is likely to face.

By: Bruegel and Konstantinos Efstathiou Topic: Macroeconomic policy Date: May 27, 2019
Read article More on this topic More by this author

Blog Post

The latest European growth-rate estimates

The quarterly growth rate of the euro area in Q1 2019 was 0.4% (1.5% annualized), considerably higher than the low growth rates of the previous two quarters. This blog reviews the reaction to the release of these numbers and the discussion they have triggered about the euro area’s economic challenges.

By: Konstantinos Efstathiou Topic: Macroeconomic policy Date: May 20, 2019
Read article More by this author

Blog Post

Is an electric car a cleaner car?

An article published by the Ifo Institute in Germany compares the carbon footprint of a battery-electric car to that of a diesel car, and argues a higher share of electric cars will not contribute to reducing German carbon dioxide emissions. Respondents rejected the authors’ calculations as unrealistic and biased, and pointed to a series of studies that conclude the opposite. We summarise the article and responses to it.

By: Michael Baltensperger Topic: Digital economy and innovation, Green economy Date: May 13, 2019
Read article More on this topic More by this author

Blog Post

All eyes on the Fed

Last week the US Federal Reserve left the federal funds rate unchanged and lowered the interest rate on excess reserves. We review economists’ recent views on the monetary policy conduct and priorities of the United States’ central bank system.

By: Inês Goncalves Raposo Topic: Global economy and trade Date: May 6, 2019
Read article More on this topic More by this author

Blog Post

Is this blog post legal (under new EU copyright law)?

How new EU rules on using snippets from news publishers and on copyright infringement liability might affect circulation of information, revenue distribution, market power and EU business competitiveness.

By: Catarina Midões Topic: Macroeconomic policy Date: April 8, 2019
Read article More on this topic

Blog Post

Secular stagnation and the future of economic stabilisation

Larry Summers’ and Łukasz Rachel’s most recent study documents a secular fall in neutral real rates in advanced economies. According to the authors, this fall would be even more marked in the absence of offsetting fiscal policies. Policymaking in a world of permanently low interest rates may be hard to navigate, especially in troubled waters. We review economists’ views on the matter

By: Inês Goncalves Raposo and Bruegel Topic: Macroeconomic policy Date: April 1, 2019
Load more posts