Calculating Correlations of Forex Currency Pairs in Python

Traders often calculate correlation between different instruments, such as stocks and ETFs, or Forex currency pairs. It is important to know if you portfolio is properly diversified. Highly correlated instruments in your portfolio will tend to go up and down together compromising your diversification strategy. Keeping an eye for high correlations (positive or negative) is even more important to Forex traders, since currency pairs often exhibit high positive or negative correlations due to market conditions or having similar market drivers. For example commodity producers such as AUD, NZD, CAD tend to react similarly to changes in commodity prices. Two positions with high negative correlation will basically cancel each other out and may not make sense to keep.

Correlation coefficient is a measure of correlation. It is calculated as a floating-point value between -1.0 and 1.0. Instruments with correlation values approaching 1.0 are called “positively correlated”, meaning they tend to move together. Symbols with correlation values approaching -1.0 are called “negatively correlated” and usually move in opposite directions. Correlation values close to zero usually mean that instruments are not related to each other and have been moving independently in the time period reflected in correlation computation.

Calculating Correlations – Example

In this post we’ll be calculating correlations in Python. More specifically we’ll calculate correlation indices for a set of currency pairs for a period of 6 months based on 15-minute price bars from end of January-through end of July of this year. Prices are stored in plain-text comma-separated (csv) files. Each file has: open, high, low, and close prices, as well as barSize, and date columns. The date column value actually contains the date and time of the bar size. Each line represents data for bar.

For this post we downloaded historical prices for nine Forex currency pairs from Interactive Brokers TWS using our IB Data Downloader, however price data can be obtained any number of ways, which is really not important for this example. Sample data files used in this example can be downloaded here: historical_data_forex_jul2015. Each file contains a little more than 12000 rows.

In the program example we’ll be using Pandas– and open source high-performance data analysis library for Python. Pandas extends NumPy functionality and provides flexible and powerful data structures similar to those found in R. We’ll be using Panda’s powerful DataFrame class, which is modeled after R’s data.frame, just way more powerful. You can think of a DataFrame as a table in a relational database, or an Excel spreadsheet, where each column has a name and an index (0-based), and an associated data type, such as string, date, integer, etc… Different column can have different data types.

We copied all our input files into directory /historical_data :

user$ ls -l
total 15280
-rw-r--r--  1 user  wheel   685743 Aug  5 14:19 historical_data_AUDJPY__MIDPOINT_15mins_20150729-201600.txt
-rw-r--r--  1 user  wheel   744849 Aug  5 14:19 historical_data_AUDUSD__MIDPOINT_15mins_20150729-201600.txt
-rw-r--r--  1 user  wheel   693465 Aug  5 14:19 historical_data_CADJPY__MIDPOINT_15mins_20150729-201600.txt
-rw-r--r--  1 user  wheel   741056 Aug  5 14:19 historical_data_EURCAD__MIDPOINT_15mins_20150729-201600.txt
-rw-r--r--  1 user  wheel   736187 Aug  5 14:19 historical_data_EURJPY__MIDPOINT_15mins_20150729-201600.txt
-rw-r--r--  1 user  wheel   760833 Aug  5 14:19 historical_data_EURUSD__MIDPOINT_15mins_20150729-201600.txt
-rw-r--r--  1 user  wheel   736543 Aug  5 14:19 historical_data_GBPUSD__MIDPOINT_15mins_20150729-201600.txt
-rw-r--r--  1 user  wheel   741383 Aug  5 14:19 historical_data_USDCHF__MIDPOINT_15mins_20150729-201600.txt
-rw-r--r--  1 user  wheel   759156 Aug  5 14:19 historical_data_USDJPY__MIDPOINT_15mins_20150729-201600.txt
user$ 

As you can see each file contains the name of the Forex currency pair in it name, such as: AUDJPY, AUDUSD, EURUSD, etc… These are important because we will be parsing out those currency pair names to use as columns in the joined DataFrame.

All files have the same format, here is a sample showing first 10 lines of AUDJPY data.:

user$ head -n10 historical_data_AUDJPY__MIDPOINT_15mins_20150729-201600.txt 
open,high,low,close,barSize,date
92.655,92.6725,92.3425,92.3425,MIN_15,2015-01-29 00:00:00
92.3425,92.465,92.315,92.335,MIN_15,2015-01-29 00:15:00
92.335,92.38,92.25,92.28,MIN_15,2015-01-29 00:30:00
92.28,92.405,92.2225,92.235,MIN_15,2015-01-29 00:45:00
92.235,92.2475,92.1225,92.155,MIN_15,2015-01-29 01:00:00
92.155,92.155,92.015,92.06,MIN_15,2015-01-29 01:15:00
92.06,92.085,91.9975,92.005,MIN_15,2015-01-29 01:30:00
92.005,92.1775,91.98,92.16,MIN_15,2015-01-29 01:45:00
92.16,92.195,92.02,92.04,MIN_15,2015-01-29 02:00:00
user$ 

Line 2 above (first row of the file) contains column names. As you can see each row represents a 15-minute price data bar, starting on midnight Jan 29, 2015.

Calculating Correlation in Python

Python source code

In order to run this python program you need to make sure you have required packages installed.
You can refer to each package’s installation instructions (we provided links below, but a simple Google search should be sufficient). Below is a sequence of commands we used to install packages on our MacOS Yosemite with Python 2.7 :

> sudo easy_install pip
> sudo pip install pandas
> sudo pip install matplotlib
> sudo pip install seaborn

We tried to document this program as well as we can, but feel free to post questions at the bottom of this page, or send them to us via the Contact Us form on the right.


import os
import glob
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn

# These settings modify the way  pandas prints data stored in a DataFrame.
# In particular when we use print(data_frame_reference); function - all
#  column values of the frame will be printed in the same  row instead of
# being automatically wrapped after 6 columns by default. This will be
# for looking at our data at the end of the program.
pd.set_option('display.height', 1000)
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

# Using glob library to create a list of file names using regular expression.
datafiles= glob.glob("/historical_data/historical_data_*.txt")

# Create an empty python dictionary which will contain currency pairs' data.
# Keys will be currency pair names, and values- 'pandas' data frames with 
# close prices read from the files.
dataframes = dict()
# In the following loop we'll read each file into a pandas data frame and 
# store them in the 'dataframes' dictionary.
# for each file...
for f in datafiles:
    #parse currency pair name from the file name
    pair_name = f.split(os.path.sep)[-1].split('_')[2]
    # Using read_csv function read file into a DataFrame 'df'.

    # Notice that we are reading only two columns from each file: 'date', and 'close'. 
    # We'll be using 'date' to index each record in data frame (think of it a a primary
    # key value for the record) , and the close price will be used to  calculate correlations. 
    df = pd.read_csv(f, sep=',', header=0, index_col=["date"], usecols=["date", "close"])
    # Rename 'close' column the the currency pair name pair.
    # This will help us identify each pair's close price below when we join all 
    # data frames into a single fame.
    df.columns = [pair_name]
    
    # Read each of files into a pandas data frame.
    dataframes[pair_name] = df
    
    
# In this section we'll join all data frames create above into a single 'final_df' 
# data frame. This data frame will contain a single 'date' column, and 1 column 
# for each currency pair containing that pair's close prices.
final_df = None
for k,v in dataframes.iteritems():
    if (final_df is None):
        final_df = v
    else:
        # Panda's join operation is similar to an SQL join. In this case
        # we are using a 'left' join.
        #http://pandas.pydata.org/pandas-docs/stable/merging.html
        final_df = final_df.join(v, how='left')
        
print("--------------- FINAL DATA FRAME ---------------")
print(final_df.head(10))

# And now.. the "hard" part- calculating correlations between pairs.
# DataFrames corr() function calculates pairwise correlations using specified 
# algorithm: 'peason, 'kendall', and 'spearman' are supported.
# Correlations are returned in a new DataFrame instance (corr_df below).
corr_df = final_df.corr(method='pearson')
print("--------------- CORRELATIONS ---------------")
print(corr_df.head(len(dataframes)))

print("--------------- CREATE A HEATMAP ---------------")
# Create a mask to display only the lower triangle of the matrix (since it's mirrored around its 
# top-left to bottom-right diagonal).
mask = np.zeros_like(corr_df)
mask[np.triu_indices_from(mask)] = True
# Create the heatmap using seaborn library. 
# List if colormaps (parameter 'cmap') is available here: http://matplotlib.org/examples/color/colormaps_reference.html
seaborn.heatmap(corr_df, cmap='RdYlGn_r', vmax=1.0, vmin=-1.0 , mask = mask, linewidths=2.5)

# Show the plot we reorient the labels for each column and row to make them easier to read.
plt.yticks(rotation=0) 
plt.xticks(rotation=90) 
plt.show()



Below is the output of the final print(corr_df.head(len(dataframes))) function, showing correlation coefficient values in matrix format.

--------------- CORRELATIONS ---------------
          CADJPY    USDCHF    GBPUSD    EURJPY    EURCAD    AUDUSD    EURUSD    USDJPY    AUDJPY
CADJPY  1.000000 -0.465311  0.497829  0.523184 -0.347024  0.340684  0.240703  0.483433  0.844388
USDCHF -0.465311  1.000000 -0.618069 -0.707785 -0.356091 -0.385425 -0.864339  0.020319 -0.431847
GBPUSD  0.497829 -0.618069  1.000000  0.832596  0.459720 -0.008430  0.641919  0.454782  0.399726
EURJPY  0.523184 -0.707785  0.832596  1.000000  0.617568 -0.042915  0.757395  0.561827  0.460505
EURCAD -0.347024 -0.356091  0.459720  0.617568  1.000000 -0.356498  0.616300  0.165831 -0.272327
AUDUSD  0.340684 -0.385425 -0.008430 -0.042915 -0.356498  1.000000  0.393271 -0.568283  0.657461
EURUSD  0.240703 -0.864339  0.641919  0.757395  0.616300  0.393271  1.000000 -0.114520  0.354258
USDJPY  0.483433  0.020319  0.454782  0.561827  0.165831 -0.568283 -0.114520  1.000000  0.246213
AUDJPY  0.844388 -0.431847  0.399726  0.460505 -0.272327  0.657461  0.354258  0.246213  1.000000

Note that a negative correlation means the two currency pairs correlate in the opposite directions (e.g. when the price for one goes up, the other one goes down and vice versa).

Heatmap

Below is the screenshot of the heatmap generated by the above code. Notice that you will need to install matplotlib, and seaborn Python packages.

Heatmap with results of calculating correlation in Python

Food for thought.

Traders should pay attention to correlation values on several timeframes. Short-term day traders are often advised to monitor longer-timeframe charts to be aware of the direction of the larger trend. Similarly, when looking at correlations between currency pairs traders should check not only correlations for their signal-generating timeframe, but also one or two longer-term timeframes, since, just like price trends, longer-term correlations are much more significant.

In conclusion, we’d like to acknowledge the wonderful book we reference below – “Python for Data Analysis”. It is a great source of detailed information on pandas and NumPy Python libraries. We’ve saved countless hours while coding Python scripts keeping this book on our desk.

Posted in Data Analysis, Programming examples
2 comments on “Calculating Correlations of Forex Currency Pairs in Python
  1. Thank you for the insightful advice related to high correlations by the way!

  2. John says:

    Just wanted to point out that this a great article. I’m new to python, started learning it only a couple of months ago and pages like this really help make learning curve much smoother. Thanks for the detailed comments in your code, great job!

Leave a Reply

Your email address will not be published. Required fields are marked *

*

IB Data Downloader


ib_downloader_3_downconf ib_downloader_3_options

IB Excel Trader

  • IB Excel Trader version 1.6 is now available!
  • Trade Stocks, ETFs, Futures, and Forex directly from Excel.
  • Implement custom trading rules using spreadsheet formulas or VBA.
  • Program entry rules for single or bracket exit orders. Market, Stop, Limit, Stop-Limit, as well as complex algo orders are supported.
  • Order Log sheet (new!). Contains a detailed list of each order status change in a filterable Excel table.
  • Use our Customization Service to extend IB Excel Trader and contract our programmers to develop your custom trading strategies.
IB-Excel-Trader-1-5_wTWS-2

Interactive Brokers (IB) is a low cost provider of trade execution and clearing services for individuals, advisors, prop trading groups, brokers and hedge funds. IB's premier technology provides direct access to stocks, options, futures, forex, bonds and funds on over 100 markets worldwide from a single IB Universal account.

Member NYSE, FINRA, SIPC. Visit www.interactivebrokers.com for more information.

Contact Us!

Please enter your contact details and a short message below and we will respond to your message shortly.