#Using Jupyter notebook on QueryPie
Kaggle is one of the largest communities of Data Scientists. And one of their most-used datasets today is related to the Coronavirus (COVID-19). As infection trends continue to update daily around the world, various sources reveal relevant data. Among them, the most extensive and most organized data available is from Johns Hopkins University.
Many notebooks use Kaggle to visualize different data. But in this article, we will learn how to save the dataset directly to the database and run it with SQL and learn how to use Jupiter Notebook with Python.
💡 How to use Corona datasets on QueryPie
1. Download Datasource and Notebook
The original data can be found at Johns Hopkins University's Center for Systems Science and Engineering (CSSE) GitHub. For our example, we will use the notebook listed on Kaggle. Data sets and notebooks are arranged here for easy follow-up, so we recommend that you download them before you practice.
2. Run QueryPie and CSV import
First, create a new database in Local named 'Corona'. Name the tables 'confirmed_df', 'deaths_df', and 'recoveries_df' respectively. As mentioned above, all the data can be found here. Go ahead and download these files:
3. Execute SQL
Once you have successfully imported the tables, run a simple SQL to check the data. You can get a birds-eye view of all the data in the tables through the following SQL statements:
SELECT * FROM confirmed_df; SELECT * FROM deaths_df; SELECT * FROM recoveries_df;
4. Run Jupyter
So let's take that dataset and start fully visualizing it. You can click Run Jupyter to send the results of the query to the notebook, but before that, make sure to set the Jupiter notebook path. If Python is already installed, check and anchor the version. For the notebook directory, you can either create a new one or specify the path with the existing notebook file (.ipynb).
Next, you can set up QueryPie Jupyter in the Jupyter List, and then import the above received ipynb file ( Coronavirus_Visualization_Prediction.ipynb ). Here, each variable name is the same as the table name. (ex. confirmed_df, deaths_df, recoveries_df)
Run the results from the three tables and call the route from the notebook to run jupyter, as shown below. You're all set!
5. Check Visualization
Let's break down the data in Jupyter Notebook.
First, check that these files are all correctly installed: numpy, matplotlib, pandas, sklearn. Then execute the following steps one by one.
(1) Dataset Import
(2) Package import and data check
Check if the data has been called correctly by confered_df. From here, we can see the date, datetime conversion, and forecast date visualized.
plt.figure(figsize=(20, 12)) plt.plot(adjusted_dates, world_cases) plt.plot(future_forcast_dates, linear_pred, linestyle='dashed') plt.title('# of Coronavirus Cases Over Time', size=30) plt.xlabel('Time in Days', size=30) plt.ylabel('# of Cases', size=30) plt.legend(['Confirmed Cases', 'Linear Regression Predictions']) plt.xticks(rotation=50, size=15) plt.show()
As shown in the example chart, you can check data related to the coronavirus from January to February, shown in linear regression based on global probabilities. All charts are implemented in matplotlib in the corresponding notebook.
Key indicators are about the Covid-19 outbreak/death/treatment rate and regional status over time, and full python codes and visualization results can be found directly through notebook execution.
#Why Jupyter notebook on QueryPie?
Although this is just a CSV example, it is most accurate to store and view data directly in the DB for minute-to-minute changing data.
Kaggle also states to "Import the data (make sure you update this on a daily basis)," which is why QueryPie directly connects to the changing data source in real-time, looks at the table, and then analyzes it on Jupyter. This way, you have the advantage of being able to analyze faster and having more accurate data!
✨Want to use SQL and Jupyter Notebook with the same tool? Check out QueryPie!
Thank you CSSE and therealcyberload for providing the original data and notebook. All sources can be found below: