Ermin Hodžić

my personal website and blog


Constructing a Vector With a Specific Pearson Correlation

PDF (LaTeX)

This tutorial will describe a solution to a problem that I have encountered while constructing synthetic data for a research project, to run simulations on. Without going into much detail, a part of the data that I needed to generate consisted of an n-dimensional vector and a separate matrix with n columns. Rather than generating data of the vector and the matrix uniformly at random, it would have been useful to decide (randomly) on a certain Pearson correlation coefficient of each row of the matrix with the vector. Once a row has been assigned a target correlation coefficient, we would like to generate values in that row which reflect the decided correlation coefficient.

More formally, given a vector \(\vec{v}\) and a target Pearson correlation coefficient value \(P\), the goal is to construct a vector \(\vec{u}\) such that \(r(\vec{u}, \vec{v}) = P\), where \(r(\vec{u},\vec{v})\) denotes the Pearson correlation coefficient between vectors \(\vec{u}\) and \(\vec{v}\).

I will assume that the reader possesses a very basic understanding of vectors in a vector space, such as what vectors are and their basic properties, as well as the rules (and visualization) of their addition and subtraction. Additionally, I will assume that the reader is familiar with the inner product (or dot product) of vectors, and its relation to the angle between vectors (or cosine similarity in higher dimensions) and the norm, or magnitude, of the vector. For those who could use a quick refresher, the first few paragraphs of the Wikipedia pages about vector space and inner product are a decent resource.

Please refer to the PDF LaTeX document for the description.