Traders often calculate correlation between different instruments, such as stocks and ETFs, or Forex currency pairs. It is important to know if you portfolio is properly diversified. Highly correlated instruments in your portfolio will tend to go up and down together compromising your diversification strategy. Keeping an eye for high correlations (positive or negative) is even more important to Forex traders, since currency pairs often exhibit high positive or negative correlations due to market conditions or having similar market drivers. For example commodity producers such as AUD, NZD, CAD tend to react similarly to changes in commodity prices. Two positions with high negative correlation will basically cancel each other out and may not make sense to keep.
Correlation coefficient is a measure of correlation. It is calculated as a floating-point value between -1.0 and 1.0. Instruments with correlation values approaching 1.0 are called “positively correlated”, meaning they tend to move together. Symbols with correlation values approaching -1.0 are called “negatively correlated” and usually move in opposite directions. Correlation values close to zero usually mean that instruments are not related to each other and have been moving independently in the time period reflected in correlation computation.
Calculating Correlations – Example
In this post we’ll be calculating correlations in Python. More specifically we’ll calculate correlation indices for a set of currency pairs for a period of 6 months based on 15-minute price bars from end of January-through end of July of this year. Prices are stored in plain-text comma-separated (csv) files. Each file has: open, high, low, and close prices, as well as barSize, and date columns. The date column value actually contains the date and time of the bar size. Each line represents data for bar.
For this post we downloaded historical prices for nine Forex currency pairs from Interactive Brokers TWS using our IB Data Downloader, however price data can be obtained any number of ways, which is really not important for this example. Sample data files used in this example can be downloaded here: historical_data_forex_jul2015. Each file contains a little more than 12000 rows.
In the program example we’ll be using Pandas– and open source high-performance data analysis library for Python. Pandas extends NumPy functionality and provides flexible and powerful data structures similar to those found in R. We’ll be using Panda’s powerful DataFrame class, which is modeled after R’s data.frame, just way more powerful. You can think of a DataFrame as a table in a relational database, or an Excel spreadsheet, where each column has a name and an index (0-based), and an associated data type, such as string, date, integer, etc… Different column can have different data types.
We copied all our input files into directory /historical_data :
user$ ls -l total 15280 -rw-r--r-- 1 user wheel 685743 Aug 5 14:19 historical_data_AUDJPY__MIDPOINT_15mins_20150729-201600.txt -rw-r--r-- 1 user wheel 744849 Aug 5 14:19 historical_data_AUDUSD__MIDPOINT_15mins_20150729-201600.txt -rw-r--r-- 1 user wheel 693465 Aug 5 14:19 historical_data_CADJPY__MIDPOINT_15mins_20150729-201600.txt -rw-r--r-- 1 user wheel 741056 Aug 5 14:19 historical_data_EURCAD__MIDPOINT_15mins_20150729-201600.txt -rw-r--r-- 1 user wheel 736187 Aug 5 14:19 historical_data_EURJPY__MIDPOINT_15mins_20150729-201600.txt -rw-r--r-- 1 user wheel 760833 Aug 5 14:19 historical_data_EURUSD__MIDPOINT_15mins_20150729-201600.txt -rw-r--r-- 1 user wheel 736543 Aug 5 14:19 historical_data_GBPUSD__MIDPOINT_15mins_20150729-201600.txt -rw-r--r-- 1 user wheel 741383 Aug 5 14:19 historical_data_USDCHF__MIDPOINT_15mins_20150729-201600.txt -rw-r--r-- 1 user wheel 759156 Aug 5 14:19 historical_data_USDJPY__MIDPOINT_15mins_20150729-201600.txt user$
As you can see each file contains the name of the Forex currency pair in it name, such as: AUDJPY, AUDUSD, EURUSD, etc… These are important because we will be parsing out those currency pair names to use as columns in the joined DataFrame.
All files have the same format, here is a sample showing first 10 lines of AUDJPY data.:
user$ head -n10 historical_data_AUDJPY__MIDPOINT_15mins_20150729-201600.txt open,high,low,close,barSize,date 92.655,92.6725,92.3425,92.3425,MIN_15,2015-01-29 00:00:00 92.3425,92.465,92.315,92.335,MIN_15,2015-01-29 00:15:00 92.335,92.38,92.25,92.28,MIN_15,2015-01-29 00:30:00 92.28,92.405,92.2225,92.235,MIN_15,2015-01-29 00:45:00 92.235,92.2475,92.1225,92.155,MIN_15,2015-01-29 01:00:00 92.155,92.155,92.015,92.06,MIN_15,2015-01-29 01:15:00 92.06,92.085,91.9975,92.005,MIN_15,2015-01-29 01:30:00 92.005,92.1775,91.98,92.16,MIN_15,2015-01-29 01:45:00 92.16,92.195,92.02,92.04,MIN_15,2015-01-29 02:00:00 user$
Line 2 above (first row of the file) contains column names. As you can see each row represents a 15-minute price data bar, starting on midnight Jan 29, 2015.
Calculating Correlation in Python
Python source code
In order to run this python program you need to make sure you have required packages installed.
You can refer to each package’s installation instructions (we provided links below, but a simple Google search should be sufficient). Below is a sequence of commands we used to install packages on our MacOS Yosemite with Python 2.7 :
> sudo easy_install pip > sudo pip install pandas > sudo pip install matplotlib > sudo pip install seaborn
We tried to document this program as well as we can, but feel free to post questions at the bottom of this page, or send them to us via the Contact Us form on the right.
import os import glob import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn # These settings modify the way pandas prints data stored in a DataFrame. # In particular when we use print(data_frame_reference); function - all # column values of the frame will be printed in the same row instead of # being automatically wrapped after 6 columns by default. This will be # for looking at our data at the end of the program. pd.set_option('display.height', 1000) pd.set_option('display.max_rows', 500) pd.set_option('display.max_columns', 500) pd.set_option('display.width', 1000) # Using glob library to create a list of file names using regular expression. datafiles= glob.glob("/historical_data/historical_data_*.txt") # Create an empty python dictionary which will contain currency pairs' data. # Keys will be currency pair names, and values- 'pandas' data frames with # close prices read from the files. dataframes = dict() # In the following loop we'll read each file into a pandas data frame and # store them in the 'dataframes' dictionary. # for each file... for f in datafiles: #parse currency pair name from the file name pair_name = f.split(os.path.sep)[-1].split('_') # Using read_csv function read file into a DataFrame 'df'. # Notice that we are reading only two columns from each file: 'date', and 'close'. # We'll be using 'date' to index each record in data frame (think of it a a primary # key value for the record) , and the close price will be used to calculate correlations. df = pd.read_csv(f, sep=',', header=0, index_col=["date"], usecols=["date", "close"]) # Rename 'close' column the the currency pair name pair. # This will help us identify each pair's close price below when we join all # data frames into a single fame. df.columns = [pair_name] # Read each of files into a pandas data frame. dataframes[pair_name] = df # In this section we'll join all data frames create above into a single 'final_df' # data frame. This data frame will contain a single 'date' column, and 1 column # for each currency pair containing that pair's close prices. final_df = None for k,v in dataframes.iteritems(): if (final_df is None): final_df = v else: # Panda's join operation is similar to an SQL join. In this case # we are using a 'left' join. #http://pandas.pydata.org/pandas-docs/stable/merging.html final_df = final_df.join(v, how='left') print("--------------- FINAL DATA FRAME ---------------") print(final_df.head(10)) # And now.. the "hard" part- calculating correlations between pairs. # DataFrames corr() function calculates pairwise correlations using specified # algorithm: 'peason, 'kendall', and 'spearman' are supported. # Correlations are returned in a new DataFrame instance (corr_df below). corr_df = final_df.corr(method='pearson') print("--------------- CORRELATIONS ---------------") print(corr_df.head(len(dataframes))) print("--------------- CREATE A HEATMAP ---------------") # Create a mask to display only the lower triangle of the matrix (since it's mirrored around its # top-left to bottom-right diagonal). mask = np.zeros_like(corr_df) mask[np.triu_indices_from(mask)] = True # Create the heatmap using seaborn library. # List if colormaps (parameter 'cmap') is available here: http://matplotlib.org/examples/color/colormaps_reference.html seaborn.heatmap(corr_df, cmap='RdYlGn_r', vmax=1.0, vmin=-1.0 , mask = mask, linewidths=2.5) # Show the plot we reorient the labels for each column and row to make them easier to read. plt.yticks(rotation=0) plt.xticks(rotation=90) plt.show()
Below is the output of the final print(corr_df.head(len(dataframes))) function, showing correlation coefficient values in matrix format.
--------------- CORRELATIONS --------------- CADJPY USDCHF GBPUSD EURJPY EURCAD AUDUSD EURUSD USDJPY AUDJPY CADJPY 1.000000 -0.465311 0.497829 0.523184 -0.347024 0.340684 0.240703 0.483433 0.844388 USDCHF -0.465311 1.000000 -0.618069 -0.707785 -0.356091 -0.385425 -0.864339 0.020319 -0.431847 GBPUSD 0.497829 -0.618069 1.000000 0.832596 0.459720 -0.008430 0.641919 0.454782 0.399726 EURJPY 0.523184 -0.707785 0.832596 1.000000 0.617568 -0.042915 0.757395 0.561827 0.460505 EURCAD -0.347024 -0.356091 0.459720 0.617568 1.000000 -0.356498 0.616300 0.165831 -0.272327 AUDUSD 0.340684 -0.385425 -0.008430 -0.042915 -0.356498 1.000000 0.393271 -0.568283 0.657461 EURUSD 0.240703 -0.864339 0.641919 0.757395 0.616300 0.393271 1.000000 -0.114520 0.354258 USDJPY 0.483433 0.020319 0.454782 0.561827 0.165831 -0.568283 -0.114520 1.000000 0.246213 AUDJPY 0.844388 -0.431847 0.399726 0.460505 -0.272327 0.657461 0.354258 0.246213 1.000000
Note that a negative correlation means the two currency pairs correlate in the opposite directions (e.g. when the price for one goes up, the other one goes down and vice versa).
Food for thought.
Traders should pay attention to correlation values on several timeframes. Short-term day traders are often advised to monitor longer-timeframe charts to be aware of the direction of the larger trend. Similarly, when looking at correlations between currency pairs traders should check not only correlations for their signal-generating timeframe, but also one or two longer-term timeframes, since, just like price trends, longer-term correlations are much more significant.
In conclusion, we’d like to acknowledge the wonderful book we reference below – “Python for Data Analysis”. It is a great source of detailed information on pandas and NumPy Python libraries. We’ve saved countless hours while coding Python scripts keeping this book on our desk.