Scatter Graphs

In this article, I’ll publish the Scatter Graphs between all of the features and my target (BitCoin Price)

The first two graphs show the correlation between the volume of Bitcoin and the Trend of the word BitCoin on google with my target. As you can see, there is a positive correlation with the volume but no correlation with  Trend of the word BitCoin:

100

The second two graphs show the correlation between the volume of LiteCoin and the average price of LiteCoin with my target. As you can see, there is a weak positive correlation with the volume but there is a strong positive correlation with the Average Price of LiteCoin:

101

The next two graphs show the correlation between the LiteCoin Trend on google and the Ethereum Trend on google. As you can see, there is no correlation in both graphs. They both show weird behavior:

102

The next two graphs show the correlation between the LTC Trend on google and the volume of Ripple. As you can see, there is no correlation in both graphs. Again, they both show weird behavior:

103

The last three graphs show the correlation between the Average Price of Ripple and the Trends of both Ripple and XRP on google. As you can see, there is no correlation in the last two graphs but the first graph shows a weak positive correlation with a weird behavior:

104

105

 

 

 

Neural Networks

I chose Neural Networks as my second Machine Learning technique since it works well with integer or float values as an input. In addition, since I had huge numbers as inputs,  it was better for me to normalize these numbers in which I did.

I followed the same cleaning procedures to my data with the same Target and Features as the previous articles and ran the Neural Networks algorithm and this was the result:

15

Again, Impressive Result. (using all of the data)

Decision Trees worked better because my data set the perfect scene for a decision tree. Decision trees work the best with numbers and categorical data hence why it outperformed the Neural Networks.

Decision Trees 2

Since combining the data of each month produced awful results, I decided not to do that but this lead to another problem. As I said in the previous article, Google Trend only offers data in months when choosing a large time range so this meant that I had to get the data in small sections and then combine them all together to obtain the values of Google Trend daily. Google Trend only provides daily data when the time range doesn’t exceed 6 to 8 months. After combining all of the data together, I obtained the datasets of Google Trend that contained daily data from 2013 up until 2018. I also did the same cleaning procedures to the previous data with some extra things and then I ran the Decision Tree Algorithm and these were the results:

14

A huge improvement in the accuracy scores but param model’s accuracy score is actually lower than the naive model which is okay. The param model doesn’t always guarantee an improvement in results.

Machine Learning Try 1 (Decision Trees)

My project involves a lot of numbers and a lot of Dirty data so in order for me to run a Machine Learning algorithm on my Data, I had to clean my data carefully and this took a lot of time. Cleaning the data or obtaining clean tables with good data was the most difficult thing about this project.

In the beginning, I had to check if my data had any null values in which I did. In order to obtain good results, I had to fill in these null values of Volume in the three files that contained the prices of Bitcoin, Ripple, and Litecoin.

1_Proj

3_Proj

After that, The files had the High and Low prices of each CryptoCurrency in which I had to calculate the average of these two columns and replace these two columns by just one that contained the average price only.

2_Proj

After doing this step, this is what the data looks like after dropping irrelevant columns as well:

5

In addition, Values of the volume in each CSV file had commas between them which weren’t appropriate for integer values in python so they were replaced by dots instead that are understandable by python:

4_proj

After obtaining such tables for each file, I figured that the dates were a lot in which I was wrong afterward. I decided to combine the values of each month of every year into one row by calculating the mean of every month which reduced a lot of rows and in addition to that, I had data from Google trend that was from 2004 and they were only offered in months when one takes a large time range so I had to keep the dates constant in all of the data (starting and ending dates as well). I also had to convert the dates to data variables :

6

It reduced the rows from 1600+ rows to 55 rows only and this was done to every CSV file:

7

 

Google Trend Data also needed some cleaning. The data has numbers between 0 and 100 in which these numbers determine the popularity of the searched item during that time. Some of the values were written as “<1” so I replaced them with 1 since the difference between 0 and 1 is negligible in terms of popularity.

8

Some of the datasets had dates that were inconsistent compared to other datasets that I had so I had to drop some of these dates from different datasets:

9.jpg

After dropping some of the dates, the indices of the data sets changed so I had to linearize them for all of the data frames that were included in this project. Since all of them were now linearized, I merged all of the data into one data frame:

10

Here is a description of the data that’s in the one data frame called Result:

10

Since I wanted to see the things that affect the price of Bitcoin, I decided to use Decision Trees as my first Machine Learning Technique. In addition to that, I was dealing with a lot of numbers and the Price of Bitcoin could be easily categorized into ‘low’, ‘medium’ and ‘high’. In order for me to determine the range of these categories, I calculated the mean and the median and I had to look at the maximum and minimum values of Bitcoin and then I decided the ranges accordingly.

From 0 to 550  = ‘low’   from 550 to 900 ‘medium’ and from 900 to 15200 as ‘high’ since I only had 55 rows of data:

12

After all of this, I had the Average price of Bitcoin as my target and the rest of the columns as my features which were only 12 (unfortunately) and I ran the Decision Tree Algorithm on the data I’ve obtained after cleaning and these were the results:

134 Awful Results I must say but this encouraged me to run the Decision Tree Algorithm without combining the data into Months which meant that I had 1600+ rows which really improved the results.

 

 

 

 

Hypothesis Testing

Critical Value

I used the Bitcoin.csv which had the volume and prices of Bitcoin.

One of the questions that I had in my proposal was if the volume of Bitcoin had any effect on its price. This graph shows the critical value calculated for the relation between Bitcoin and its Volume which was one indication of my Hypothesis. The hypothesis was to be able to predict Currency’s price by finding a relationship with a factor. Null Hypothesis would be that there is no relation. Since the critical value is greater than 5, we can reject the Null Hypothesis and the Hypothesis holds.

CyberCurrency (Updated)

Group Members:

  1. Mohamed Zeina (23441)

Introduction

Cryptocurrency is a digital currency in which encryption techniques are used to regulate the generation of units of currency and verify the transfer of funds.

The number of cryptocurrencies available over the internet as of 7 January 2018 is over 1384 and growing.

Here are some of the most viral ones:

  1. Bitcoin
  2. Litecoin
  3. Ethereum
  4. Ripple

Motivation

Cryptocurrency is a hot topic these days and lots of people have been wondering about the path of price fluctuations that these currencies are going through. In this project, I will be focusing on the prices of Bitcoin, Litecoin, Ripple, and Ethereum to try and understand the behavior of their prices.

DataSets:

  1. The first dataset contains the price of Bitcoin, Litecoin, Ripple, and Ethereum (lows and highs)  and the total amount of coins existing from March 2013 until February 2018
  2. The second dataset contains the number of searches for the word “Bitcoin” on google search engines.
  3. The third dataset contains the number of searches for the word “Litecoin” on google search engines.
  4. The fourth dataset contains the number of searches for the word “Ripple” on google search engines.
  5. The fifth dataset contains the number of searches for the word “Ethereum” on google search engines.
  6. The sixth dataset contains National Holidays in the US.

Questions:

  1. Is the price of these currencies affected by Huge sale periods such as “Black Friday”? (Which I couldn’t implement)
  2. Is the price of these currencies affected by holidays such as “Christmas” in the US? (Which I couldn’t implement)
  3. Is the price of BitCoin dependent on the other small ones and vice versa.
  4. Does the amount (volume) of a single currency existing in the market affect the price of BitCoin?

 

 

 

 

Cryptocurrency

 

Group Members:

  1. Mohamed Zeina (23441)

Introduction

Cryptocurrency is a digital currency in which encryption techniques are used to regulate the generation of units of currency and verify the transfer of funds, operating independently of a central bank which means that its price does not depend on the local market or the local economy. 

The number of cryptocurrencies available over the internet as of 7 January 2018 is over 1384 and growing.

Here are some of the most viral ones:

  1. Bitcoin
  2. Litecoin
  3. Ethereum 
  4. Zcash
  5. Ripple

Motivation

Cryptocurrency is a hot topic these days and lots of people have been wondering about the path of price fluctuations that these currencies are going through. In this project, I will be focusing on Bitcoin because it is the one that went viral the most, and it also had an impact on the prices of other cryptocurrencies. It’s interesting to find a correlation with the price of Bitcoin in order to predict its price in the future for trading purposes.

The two free data sets that have been obtained are as follow: 

  • The first dataset contains the price of Bitcoin (lows and highs) from March 2013 until February 2018
  • The second dataset contains the number of searches for the word “Bitcoin” on google search engines. 

I think and have actually observed that there is a high correlation between these two data sets. To have a clearer view of how there might be a correlation between both sets, here’s a graph showing the search count for the word “Bitcoin” in millions: 

Search_Graph.png

With the data from the first dataset, it confirms that at those high spikes in 2017, this is when Bitcoin’s price Skyrocketed.

Links to obtained datasets: