Russian Federation Economic Indicators¶

Author: Lily McBeath

Course Project, UC Irvine, Math 10, W22

Introduction¶

I am planning to explore various economic indicators in the Russian Federation from 2006 to 2020, and to what extent they correlate with one another. I am using data on economic factors from The World Bank’s Data website, cited below. I am also looking into whether any of these economic indicators correlate with the exchange rate between the Russian Ruble and US Dollar. This data may be interesting given the recent sharp fall in the value of the Ruble, following sanctions on Russia related to the war in Ukraine.

Main portion of the project¶

import numpy as np
import pandas as pd
df = pd.read_csv("russia.csv")

df

	Country Name	Country Code	Indicator Name	Indicator Code	1960	1961	1962	1963	1964	1965	...	2012	2013	2014	2015	2016	2017	2018	2019	2020	Unnamed: 65
0	Russian Federation	RUS	Air transport, freight (million ton-km)	IS.AIR.GOOD.MT.K1	NaN	NaN	NaN	NaN	NaN	NaN	...	4.132144e+03	4.249269e+03	4.413559e+03	4.761047e+03	5.863197e+03	6.845230e+03	6.810610e+03	6.481000e+03	4.314605e+03	NaN
1	Russian Federation	RUS	CPIA efficiency of revenue mobilization rating...	IQ.CPA.REVN.XQ	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	Russian Federation	RUS	CPIA business regulatory environment rating (1...	IQ.CPA.BREG.XQ	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	Russian Federation	RUS	Investment in transport with private participa...	IE.PPI.TRAN.CD	NaN	NaN	NaN	NaN	NaN	NaN	...	3.983900e+09	3.100000e+06	NaN	1.822200e+09	7.940000e+07	2.037000e+09	1.622700e+09	3.357770e+09	9.257100e+08	NaN
4	Russian Federation	RUS	Time required to start a business, male (days)	IC.REG.DURS.MA	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	1.320000e+01	1.150000e+01	1.080000e+01	1.010000e+01	1.010000e+01	1.010000e+01	1.010000e+01	NaN	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
1438	Russian Federation	RUS	Changes in inventories (current US$)	NE.GDI.STKB.CD	NaN	NaN	NaN	NaN	NaN	NaN	...	6.608344e+10	3.052414e+10	2.005045e+10	2.095911e+10	1.550199e+10	2.557471e+10	2.094207e+10	2.742143e+10	3.310177e+10	NaN
1439	Russian Federation	RUS	Gross national expenditure (constant LCU)	NE.DAB.TOTL.KN	NaN	NaN	NaN	NaN	NaN	NaN	...	9.010340e+13	9.124340e+13	9.030140e+13	8.225460e+13	8.117190e+13	8.452690e+13	8.639170e+13	8.899610e+13	8.505000e+13	NaN
1440	Russian Federation	RUS	Households and NPISHs Final consumption expend...	NE.CON.PRVT.KN	NaN	NaN	NaN	NaN	NaN	NaN	...	4.815000e+13	5.059880e+13	5.164600e+13	4.677560e+13	4.558870e+13	4.727660e+13	4.927200e+13	5.082080e+13	4.648470e+13	NaN
1441	Russian Federation	RUS	Military expenditure (current USD)	MS.MIL.XPND.CD	NaN	NaN	NaN	NaN	NaN	NaN	...	8.146940e+10	8.835290e+10	8.469650e+10	6.642182e+10	6.924529e+10	6.691303e+10	6.160920e+10	6.520134e+10	6.171254e+10	NaN
1442	Russian Federation	RUS	Fixed broadband subscriptions (per 100 people)	IT.NET.BBND.P2	NaN	NaN	NaN	NaN	NaN	NaN	...	1.453068e+01	1.645264e+01	1.724714e+01	1.854085e+01	1.894519e+01	2.137238e+01	2.200089e+01	2.252492e+01	2.321094e+01	NaN

1443 rows × 66 columns

df.isna().sum()[36:]

          882
          838
          739
          750
          748
          720
          706
          692
          590
          579
          512
          563
          562
          555
          564
          499
          528
          516
          495
          517
          466
          489
          414
          454
          458
          467
          513
          591
          885
Unnamed: 65    1443
dtype: int64

We can see that there is a lot of missing data here, which I will address soon.

First I want to drop the Country Name, Country Code, and Unnamed: 65 columns, which are the same for all factors, and the Indicator Code, as the Indicator Name is more descriptive.

df.drop(labels=["Country Name", "Country Code", "Unnamed: 65", "Indicator Code"], axis=1, inplace=True)

Now I want to look at the factors that have no missing data, and starting in the year 2006.

df.drop(df.iloc[:, 1:47], inplace=True, axis=1)
df.dropna(inplace=True)

df

	Indicator Name	2006	2007	2008	2009	2010	2011	2012	2013	2014	2015	2016	2017	2018	2019	2020
0	Air transport, freight (million ton-km)	1.926295e+03	1.224313e+03	2.399593e+03	2.305548e+03	3.531583e+03	3.900120e+03	4.132144e+03	4.249269e+03	4.413559e+03	4.761047e+03	5.863197e+03	6.845230e+03	6.810610e+03	6.481000e+03	4.314605e+03
12	Broad money (% of GDP)	3.762204e+01	4.297675e+01	3.950786e+01	4.929882e+01	5.143911e+01	4.738647e+01	4.728942e+01	5.120003e+01	5.429533e+01	6.182658e+01	5.944583e+01	5.952221e+01	5.911864e+01	5.907597e+01	7.038103e+01
13	Commercial bank branches (per 100,000 adults)	3.036000e+01	3.364000e+01	3.557000e+01	3.464000e+01	3.506000e+01	3.675000e+01	3.825000e+01	3.852000e+01	3.704000e+01	3.293000e+01	3.013000e+01	2.923000e+01	2.626000e+01	2.563000e+01	2.459000e+01
20	Debt service on external debt, total (TDS, cur...	4.789859e+10	3.787248e+10	8.105131e+10	9.113474e+10	5.571607e+10	4.463868e+10	5.006398e+10	5.092890e+10	8.983215e+10	1.040257e+11	9.344588e+10	8.111808e+10	1.099975e+11	9.620816e+10	9.761423e+10
22	Commercial banks and other lending (PPG + PNG)...	2.519503e+10	5.257957e+10	4.573345e+10	-1.850812e+10	2.342559e+10	4.698240e+10	7.076591e+10	4.357459e+10	-3.068587e+10	-5.652469e+10	1.189512e+10	2.225490e+09	-4.589384e+10	-3.161458e+10	-1.757524e+10
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
1438	Changes in inventories (current US$)	2.641315e+10	4.119105e+10	5.332577e+10	-3.752001e+10	1.509159e+10	6.013144e+10	6.608344e+10	3.052414e+10	2.005045e+10	2.095911e+10	1.550199e+10	2.557471e+10	2.094207e+10	2.742143e+10	3.310177e+10
1439	Gross national expenditure (constant LCU)	6.764728e+13	7.732504e+13	8.442601e+13	7.140223e+13	7.777010e+13	8.501400e+13	9.010340e+13	9.124340e+13	9.030140e+13	8.225460e+13	8.117190e+13	8.452690e+13	8.639170e+13	8.899610e+13	8.505000e+13
1440	Households and NPISHs Final consumption expend...	3.327116e+13	3.798122e+13	4.195640e+13	3.980456e+13	4.197077e+13	4.478010e+13	4.815000e+13	5.059880e+13	5.164600e+13	4.677560e+13	4.558870e+13	4.727660e+13	4.927200e+13	5.082080e+13	4.648470e+13
1441	Military expenditure (current USD)	3.451778e+10	4.353499e+10	5.618379e+10	5.153212e+10	5.872023e+10	7.023752e+10	8.146940e+10	8.835290e+10	8.469650e+10	6.642182e+10	6.924529e+10	6.691303e+10	6.160920e+10	6.520134e+10	6.171254e+10
1442	Fixed broadband subscriptions (per 100 people)	2.022269e+00	3.420206e+00	6.478241e+00	9.000403e+00	1.094235e+01	1.227170e+01	1.453068e+01	1.645264e+01	1.724714e+01	1.854085e+01	1.894519e+01	2.137238e+01	2.200089e+01	2.252492e+01	2.321094e+01

493 rows × 16 columns

I am going to transform the structure of this DataFrame somewhat to more closely mimic the data we are used to working with in this class.

df = df.transpose().copy()
df.columns = df.iloc[0]
df = df[1:].copy()
df.index.name = 'Year'
df = df.reset_index()
df['Year'] = pd.to_datetime(df['Year']).dt.year
df.iloc[:,1:] = df.iloc[:,1:].apply(pd.to_numeric)

df.head()

Indicator Name	Year	Air transport, freight (million ton-km)	Broad money (% of GDP)	Commercial bank branches (per 100,000 adults)	Debt service on external debt, total (TDS, current US$)	Commercial banks and other lending (PPG + PNG) (NFL, current US$)	Portfolio investment, bonds (PPG + PNG) (NFL, current US$)	External debt stocks (% of GNI)	Secondary income receipts (BoP, current US$)	Foreign direct investment, net outflows (% of GDP)	...	Taxes less subsidies on products (constant LCU)	GNI per capita, Atlas method (current US$)	GDP per capita, PPP (constant 2017 international $)	Manufacturing, value added (constant 2015 US$)	External balance on goods and services (current LCU)	Changes in inventories (current US$)	Gross national expenditure (constant LCU)	Households and NPISHs Final consumption expenditure (constant LCU)	Military expenditure (current USD)	Fixed broadband subscriptions (per 100 people)
0	2006	1926.295	37.622039	30.36	4.789859e+10	2.519503e+10	1.221660e+10	32.386639	5.318470e+09	3.029824	...	7.740037e+12	5810.0	21757.465166	1.496040e+11	3.425900e+12	2.641315e+10	6.764728e+13	3.327116e+13	3.451778e+10	2.022269
1	2007	1224.313	42.976748	33.64	3.787248e+10	5.257957e+10	3.830479e+09	32.759128	6.220470e+09	3.447028	...	8.481176e+12	7560.0	23647.266506	1.608578e+11	2.866600e+12	4.119105e+10	7.732504e+13	3.798122e+13	4.353499e+10	3.420206
2	2008	2399.593	39.507861	35.57	8.105131e+10	4.573345e+10	-2.304212e+10	25.973782	7.345430e+09	3.351460	...	8.942480e+12	9580.0	24887.852720	1.574440e+11	3.812600e+12	5.332577e+10	8.442601e+13	4.195640e+13	5.618379e+10	6.478241
3	2009	2305.548	49.298818	34.64	9.113474e+10	-1.850812e+10	-9.323914e+09	34.361362	6.369080e+09	3.539911	...	7.653594e+12	9230.0	22939.694054	1.344322e+11	2.887700e+12	-3.752001e+10	7.140223e+13	3.980456e+13	5.153212e+10	9.000403
4	2010	3531.583	51.439107	35.06	5.571607e+10	2.342559e+10	3.674379e+09	28.277358	7.258440e+09	3.450434	...	8.211698e+12	9980.0	23961.220293	1.460048e+11	3.739700e+12	1.509159e+10	7.777010e+13	4.197077e+13	5.872023e+10	10.942347

5 rows × 494 columns

df.columns

Index(['Year', 'Air transport, freight (million ton-km)',
       'Broad money (% of GDP)',
       'Commercial bank branches (per 100,000 adults)',
       'Debt service on external debt, total (TDS, current US$)',
       'Commercial banks and other lending (PPG + PNG) (NFL, current US$)',
       'Portfolio investment, bonds (PPG + PNG) (NFL, current US$)',
       'External debt stocks (% of GNI)',
       'Secondary income receipts (BoP, current US$)',
       'Foreign direct investment, net outflows (% of GDP)',
       ...
       'Taxes less subsidies on products (constant LCU)',
       'GNI per capita, Atlas method (current US$)',
       'GDP per capita, PPP (constant 2017 international $)',
       'Manufacturing, value added (constant 2015 US$)',
       'External balance on goods and services (current LCU)',
       'Changes in inventories (current US$)',
       'Gross national expenditure (constant LCU)',
       'Households and NPISHs Final consumption expenditure (constant LCU)',
       'Military expenditure (current USD)',
       'Fixed broadband subscriptions (per 100 people)'],
      dtype='object', name='Indicator Name', length=494)

I will perform an initial analysis with the population data, looking at how the rural population is related to dependency.

df.iloc[:,df.columns.str.contains('Rural population')]

Indicator Name	Rural population growth (annual %)	Rural population (% of total population)	Rural population
0	-0.497164	26.492	37896710.0
1	-0.341091	26.447	37767668.0
2	-0.214246	26.402	37686839.0
3	-0.136685	26.358	37635362.0
4	-0.125974	26.313	37587981.0
5	-0.093185	26.268	37552971.0
6	-0.056555	26.209	37531739.0
7	-0.062143	26.137	37508423.0
8	-0.115774	26.050	37465023.0
9	-0.192057	25.950	37393138.0
10	-0.270029	25.836	37292302.0
11	-0.389793	25.708	37147222.0
12	-0.563045	25.567	36938654.0
13	-0.653729	25.413	36697963.0
14	-0.868789	25.246	36380516.0

df.iloc[:,df.columns.str.contains('dependency')]

Indicator Name	Age dependency ratio, young (% of working-age population)	Age dependency ratio (% of working-age population)	Age dependency ratio, old (% of working-age population)
0	20.942940	40.335644	19.392704
1	20.632962	39.705743	19.072782
2	20.490918	39.157910	18.666992
3	20.527639	38.884284	18.356645
4	20.742029	38.953829	18.211800
5	21.144384	39.410853	18.266469
6	21.793443	40.217864	18.424421
7	22.605389	41.290973	18.685584
8	23.454453	42.495255	19.040802
9	24.259000	43.741551	19.482552
10	25.122954	45.295788	20.172834
11	25.893200	46.837028	20.943828
12	26.574334	48.343277	21.768943
13	27.198002	49.810602	22.612600
14	27.767416	51.220020	23.452604

df = df.rename(columns={"Rural population (% of total population)":"Rural population percent of total", "Age dependency ratio (% of working-age population)":"Age dependency ratio"})

According to the World Bank’s Glossary, the age dependency ratio is the ratio of those younger than 15 or older than 64 to the working-age population, and the rural population as a percentage of total population is found according to the data from the United Nations Population Division.

I will use the rural population as a percentage of total population as input, and the age dependency ratio as a percent of working-age population as output. Since these are both percentile values, rescaling is likely unnecessary.

from sklearn.linear_model import LinearRegression
reg = LinearRegression()
reg.fit(df[['Rural population percent of total']], df['Age dependency ratio'])

LinearRegression()

df['Age dependency ratio pred'] = reg.predict(df[['Rural population percent of total']])

Now we will plot the results in Altair. The method for creating repeat layered charts is taken from the Altair documentation and requires Altair version 4.2.0 to run. If the charts display an error, uncomment the !pip install altair==4.2.0 line, run the cell, comment the !pip install altair==4.2.0 line again, and run the cell once more.

import altair as alt
# !pip install altair==4.2.0 # (run this if graphs are not showing, then run graphs again)
c1 = alt.Chart(df).mark_line().encode(
    x="Rural population percent of total:Q",
    y=alt.Y('Age dependency ratio pred:Q',
        scale=alt.Scale(zero=False)
    )
).properties(
    title="Predicted and Actual Age Dependency Ratio"
)
c2 = alt.Chart(df).mark_point().encode(
    x="Rural population percent of total",
    y=alt.Y('Age dependency ratio:Q',
        scale=alt.Scale(zero=False)
    )
)
c3 = alt.Chart(df).mark_line().encode(
    x = 'Year:O',
    y=alt.Y(alt.repeat('layer'),
        type='quantitative',
        title='Rural Population and Age Dependency Ratio',
        scale=alt.Scale(zero=False)
        ),
    color=alt.ColorDatum(alt.repeat('layer'))
).properties(
    title="Actual Age Dependency vs. Rural Population"
).repeat(layer=["Age dependency ratio", "Rural population percent of total"])
alt.layer(c1, c2)|c3

print(f"Age dependency ratio appears negatively correlated with rural population percentage, by a factor of {round(reg.coef_[0])}.")

Age dependency ratio appears negatively correlated with rural population percentage, by a factor of -11.

from sklearn.metrics import mean_absolute_error
mean_absolute_error(df['Age dependency ratio'],df['Age dependency ratio pred'])

0.637496926380498

It appears from this preliminary work that rural population is a somewhat good indicator of the age dependency ratio, in that as the rural population as a percent of total population increases, the age dependency ratio as a percent of working population decreases.

The next step will be to consider some other non-population-related indicators and compare with the Russian Ruble/US Dollar exchange rate. The Ruble to US Dollar historical spot rates were obtained from the Bank of England’s Statistical Interactive Database. We will reverse the order of the rows to sort by ascending years, and invert the average annual USD/RUB rates given to obtain the RUB/USD rates.

rubusd = pd.read_csv("rubusd.csv")
rubusd = rubusd.reindex(index=rubusd.index[::-1])
rubusd.reset_index(inplace=True)
df["RATE"] = rubusd.iloc[:,2]
df["RATE"] = 1/df["RATE"]

Let’s examine how the indicators we looked at earlier compare to the RUB/USD rate from 2006 to 2020 using a chart, rescaling to more clearly compare the values.

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
df = df.rename(columns={"Rural population percent of total":"Rural population percent", "Age dependency ratio":"Age dependency", "RATE":"Ruble to Dollar"})
dfscaled = df.copy()
dfscaled.iloc[:,1:] = scaler.fit_transform(df.iloc[:,1:])
c4 = alt.Chart(dfscaled).mark_line().encode(
    x = 'Year:O',
    y=alt.Y(alt.repeat('layer'), type='quantitative', title='Scaled Indicators and Exchange Rate'),
    color=alt.ColorDatum(alt.repeat('layer'))
).properties(
    title="Scaled Rural Population Percentage and Age Dependency Ratio vs. Ruble to Dollar Exchange Rates",
    width=700
).repeat(
    layer=["Rural population percent","Age dependency", "Ruble to Dollar"]
)
c4

It is interesting that the rural population as a percent of total population in Russia (blue) seems to decline together with the RUB/USD rate (red), while the age dependency ratio as a percent of working population in Russia (orange) has been increasing since 2009.

What do the remaining indicators look like, in comparison to the Ruble / Dollar exchange rates?

Let us see if we can use some of these indicators to predict the Ruble to Dollar exchange rates. After looking into the definitions of leading vs. lagging indicators and the top economic indicators for the U.S. economy (acknowledging that these may not be optimal for an analysis of the Russian economy, but useful nonetheless for our purposes), I will try to predict the exchange rates using the following economic indicators, which include the rural population and age dependency metrics we looked at earlier, as well as others that are relevant to a nation’s economy:

Rural population percent
Age dependency
GDP (current US$)
Population growth (annual %)
Real interest rate (%)
Inflation, consumer prices (annual %)
Unemployment, total (% of total labor force) (national estimate)
Stocks traded, total value (current US$)
Merchandise trade (% of GDP)
Air transport, passengers carried
International tourism, number of arrivals
Net primary income (Net income from abroad) (current US$)
Refugee population by country or territory of origin
Foreign direct investment, net inflows (% of GDP)

Here are the indicators plotted together.

indicators = ['Rural population percent', 'Age dependency', 'GDP (current US$)', 'Population growth (annual %)', 'Real interest rate (%)', 'Inflation, consumer prices (annual %)', 'Unemployment, total (% of total labor force) (national estimate)', 'Stocks traded, total value (current US$)', 'Merchandise trade (% of GDP)', 'Air transport, passengers carried', 'International tourism, number of arrivals', 'Net primary income (Net income from abroad) (current US$)', 'Refugee population by country or territory of origin', 'Foreign direct investment, net inflows (% of GDP)']
c5 = alt.Chart(dfscaled).mark_line().encode(
    x = 'Year:O',
    y = alt.Y(alt.repeat('layer'), type='quantitative', title='Scaled Indicators'),
    color = alt.ColorDatum(alt.repeat('layer')), #.Color(scale=alt.Scale(scheme = 'category20c'))
    strokeDash = alt.StrokeDashDatum(alt.repeat('layer')),
).properties(
    title="Scaled Economic Indicators in Russia from 2006 to 2020",
    width=600
).repeat(
    layer=indicators
)
c5

We can see that there is a significant amount of variation in these indicators’ values over the 15 years. Also, notice that international tourism decreases sharply in 2020 (as expected).

We will use the years 2006-2016 as our training set and the years 2017-2020 as our test set, so that we can test the accuracy of our model on the data and be warned of possible overfitting. Since I want to use these specific rows for my training and test sets, I will use iloc to define them rather than train_test_split.

First we will attempt to use Linear Regression, in hopes that the coefficients can give us some idea of how these indicators might be related to the exchange rate.

X_train = df[indicators].iloc[:11]
X_test = df[indicators].iloc[11:]
y_train = df['Ruble to Dollar'].iloc[:11]
y_test = df['Ruble to Dollar'].iloc[11:]
df_train = df.iloc[:11].copy()
df_test = df.iloc[11:].copy()
lrg = LinearRegression()
lrg.fit(X_train, y_train)

LinearRegression()

To visually evaluate the results of Linear Regression, I will use methods adapted from the Pandas user guide.

coefficients = pd.DataFrame(indicators, columns=["Indicators"])
coefficients["Coefficients"] = lrg.coef_
coefficients.set_index("Indicators", inplace=True)
def style_positive(v, props=''):
    return props if v >= 0 else None
def style_negative(v, props=''):
    return props if v < 0 else None
def highlight_max(s, props=''):
    return np.where(s == np.max(s.values), props, '')
def highlight_min(s, props=''):
    return np.where(s == np.min(s.values), props, '')
coefficients[["Coefficients"]].style.format({"Coefficients": '{:.2E}'})\
                                    .applymap(style_positive, props='color:green;')\
                                    .applymap(style_negative, props='color:red;')\
                                    .applymap(lambda v: 'opacity: 20%;' if (v < 0.000001) and (v > -0.000001) else None)\
                                    .apply(highlight_max, props='color:white;background-color:green', axis=0)\
                                    .apply(highlight_min, props='color:white;background-color:red', axis=0)

	Coefficients
Indicators
Rural population percent	3.48E-06
Age dependency	-2.08E-04
GDP (current US$)	-1.06E-14
Population growth (annual %)	9.57E-06
Real interest rate (%)	-6.69E-05
Inflation, consumer prices (annual %)	-2.76E-04
Unemployment, total (% of total labor force) (national estimate)	1.73E-05
Stocks traded, total value (current US$)	1.58E-14
Merchandise trade (% of GDP)	-6.10E-05
Air transport, passengers carried	-1.40E-10
International tourism, number of arrivals	3.94E-10
Net primary income (Net income from abroad) (current US$)	-4.33E-13
Refugee population by country or territory of origin	8.23E-08
Foreign direct investment, net inflows (% of GDP)	1.53E-04

Above, we see indicators with relatively small coefficients in low opacity, negative coefficients (i.e. negative correlation according to the model) in red, and positive coefficients in green. We also see the highlighted maximum and minimum coefficients, representing the indicator with highest positive correlation and highest negative correlation, respectively, according to the model. It appears that net inflows of foreign direct investment as a percent of GDP may be an indicator of growth in the RUB/USD exchange rate, while inflation of consumer prices may be an indicator of decline in the RUB/USD exchange rate. This seems reasonable from an economic standpoint.

Now we will attempt to gage the accuracy of this model.

from sklearn.metrics import mean_absolute_error
train_error_lrg = mean_absolute_error(lrg.predict(X_train), y_train)
test_error_lrg = mean_absolute_error(lrg.predict(X_test), y_test)
train_error_lrg

7.273399827656654e-09

test_error_lrg

0.003865468556259171

test_error_lrg/train_error_lrg

531452.7796974622

test_error_lrg/(y_test.mean())

0.24804983762756344

Notice that the test error is larger than the training error by over 6 orders of magnitude, which would suggest overfitting.

The test error from linear regression is about 25% of the average RUB/USD value in the test set.

Now we will try a different approach: K-Nearest Neighbors Regression.

from sklearn.neighbors import KNeighborsRegressor
kng3 = KNeighborsRegressor(n_neighbors=3)
kng3.fit(X_train,y_train)

KNeighborsRegressor(n_neighbors=3)

train_error_kng3 = mean_absolute_error(kng3.predict(X_train), y_train)
test_error_kng3 = mean_absolute_error(kng3.predict(X_test), y_test)
train_error_kng3

0.0038378584123845735

test_error_kng3

0.006711039436260572

test_error_kng3/train_error_kng3

1.7486417462938157

test_error_kng3/y_test.mean()

0.4306521236037697

Notice that this time our training and test errors are closer in value, which suggests that we do not have a problem with overfitting. However, the test error as a percentage of the mean of the RUB/USD values is larger, about 43%.

Let us visually compare our results from both methods.

df_test["Linear regression prediction"] = lrg.predict(X_test)
df_test["3-Neighbors regression prediction"] = kng3.predict(X_test)
df_test["Actual Ruble to Dollar"] = y_test
c6 = alt.Chart(df_test).mark_line().encode(
    x = 'Year:O',
    y=alt.Y(alt.repeat('layer'),
        type='quantitative',
        title='Predicted and Actual RUB/USD'
        #scale=alt.Scale(zero=False)
        ),
    color=alt.ColorDatum(alt.repeat('layer'))
).properties(
    title="Predicted vs. Actual RUB/USD Using Linear Regression, K-Neighbors Regression",
    width=500,
    height=300
).repeat(layer=["Actual Ruble to Dollar", "Linear regression prediction", "3-Neighbors regression prediction"])
c6

It is clear that the linear regression method underestimates the RUB/USD exchange rates, while the K-Neighbors prediction is an overestimate.

Would a different value of K give a better model? We investigate using the method for plotting the train and test curves from the course notes.

def get_scores(k):
    reg = KNeighborsRegressor(n_neighbors=k)
    reg.fit(X_train, y_train)
    train_error = mean_absolute_error(reg.predict(X_train), y_train)
    test_error = mean_absolute_error(reg.predict(X_test), y_test)
    return (train_error, test_error)
df_scores = pd.DataFrame({"k":range(1,12),"train_error":np.nan,"test_error":np.nan})
for i in df_scores.index:
    df_scores.loc[i,["train_error","test_error"]] = get_scores(df_scores.loc[i,"k"])
df_scores["kinv"] = 1/df_scores.k
c7 = alt.Chart(df_scores).mark_line().encode(
    x = "kinv",
    y=alt.Y(alt.repeat('layer'),
        type='quantitative',
        title='Mean Absolute Error'
        #scale=alt.Scale(zero=False)
        ),
    color=alt.ColorDatum(alt.repeat('layer'))
).properties(
    title="Train Error and Test Error for Different Values of K",
    width=600
).repeat(layer=["train_error", "test_error"])
c7

It appears from the above graph that 1/K = 0.25 (so K = 4) may give a slightly better prediction than the original K = 3. We will avoid K = 1, despite the seemingly accurate results above, as this will result in overfitting.

As a final step, we will observe all of these predictions together along with the RUB/USD actual rates, from 2017 to 2020:

kng4 = KNeighborsRegressor(n_neighbors=4)
kng4.fit(X_train,y_train)
df_test["4-Neighbors regression prediction"] = kng4.predict(X_test)
c8 = alt.Chart(df_test).mark_line().encode(
    x = 'Year:O',
    y=alt.Y(alt.repeat('layer'),
        type='quantitative',
        title='Predicted and Actual RUB/USD'
        #scale=alt.Scale(zero=False)
        ),
    color=alt.ColorDatum(alt.repeat('layer'))
).properties(
    title="Predicted vs. Actual RUB/USD Using Linear Regression, K-Neighbors Regression",
    width=500,
    height=300
).repeat(layer=["Actual Ruble to Dollar", "Linear regression prediction", "3-Neighbors regression prediction", "4-Neighbors regression prediction"])
c8

Summary¶

Although we had only a few years of data to work with, it was possible to gain some insights into the relationship between key economic indicators in Russia and the Ruble to U.S. Dollar exchange rates over the past 15 years. The linear regression model pointed to inflation and foreign investment as potentially significant economic indicators relating to the value of the Ruble. Additionally, predictions using linear regression and K-nearest neighbors regression appear to have some degree of usefulness, although this is limited.

There are a few potential areas where this analysis could be improved through further work. First, if more frequent data were available (quarterly, monthly, weekly, etc.), this would likely improve the accuracy of our models. Unfortunately such data did not appear to be readily available.

Additionally, the year 2020 presents a possible outlier in our data, considering how international tourism and other economic indicators were at uncharacteristic levels during this year. That being said, the year 2020 remains in this project for two reasons: the first being that removing it would constitute a 7% reduction in the number of data points, and the second being that a good model of RUB/USD exchange rates with regard to economic factors would ideally be able to accurately model the Ruble’s value even in times of crisis and uncertainty.

References¶

Bank of England (2022) - “Interest & exchange rates data.” Published online at BankOfEngland.co.uk. Retrieved from: https://www.bankofengland.co.uk/boeapps/database/index.asp?first=yes&SectionRequired=I&HideNums=-1&ExtraInfo=true

Bruce C. Dieffenbach (2014) - “Leading Economic Indicators.” Published online at Albany.edu/~bd445/. Retrieved from: https://www.albany.edu/~bd445/Economics_301_Intermediate_Macroeconomics_Slides_Spring_2014/Leading_Economic_Indicators_(Print).pdf

The Conference Board (2012) - “Description of Components.” Published online at Conference-Board.org. Retrieved from: https://www.conference-board.org/data/bci/index.cfm?id=2160

Hannah Ritchie, Edouard Mathieu, Max Roser, Bastian Herre, Joe Hasell, Esteban Ortiz-Ospina, Bobbie Macdonald, Fiona Spooner and Pablo Rosado (2022) - “War in Ukraine.” Published online at OurWorldInData.org. Retrieved from: https://ourworldindata.org/ukraine-war

Max Roser, Bastian Herre and Joe Hasell (2013) - “Nuclear Weapons.” Published online at OurWorldInData.org. Retrieved from: https://ourworldindata.org/nuclear-weapons

World Bank (2020) - “Russian Federation Data.” Published online at Data.WorldBank.org. Retrieved from: https://data.worldbank.org/country/russian-federation

Created in Deepnote

UC Irvine Math 10 W22

Russian Federation Economic Indicators

Contents

Russian Federation Economic Indicators¶

Introduction¶

Main portion of the project¶

Summary¶

References¶