Machine Learning Model

  • In evaluating our data, we considered all points that ultimately would support our topic;

    • Do the number of incentives offered in counties of California contribute to EV sales?

  • After initial evaluation we identified that the incentives and sales data would ultimately provide the best model for our hypothesis.

  • The demographics data would serve best as supporting data.

  • In evaluating our data, we considered all points that ultimately would support our topic; Do the number of incentives offered in counties of California contribute to EV sales? After initial evaluation we identified that the incentives and sales data would ultimately provide the best model for our hypothesis. The demographics data would serve best as supporting data.

  • Cleaning the data

    • The initial reshaping of the data took place in Jupyter notebook. We read in each of the data sets, identified that there were several nulls that needed to be removed. In addition, we took a fraction of the data given that would demographics data set was large.

  • AWS Relational Database and PG Admin

    • After completion of the initial evaluation, cleaning and reshaping of the data, we created a relation database in AWS and connected it to our PG Admin account. This allowed us to join our datasets and start correlating our metadata.

  • Our Dependent variables were: Ownership/Sales

  • Our Independent variables were: Income, Inventives, Length of Commute, etc.

    • The final 3 highest correlated facors were determined by multiple linear regression analysis

  • Due to COVID-19 creating large variance, we will not include 2020-2021 in our analysis