#Using Jupyter notebook on QueryPie
Kaggle is one of the largest communities of Data Scientists. And one of their most-used datasets today is related to the Coronavirus (COVID-19). As infection trends continue to update on a daily basis around the world, there are a variety of sources that reveal relevant data. Among them, the largest and most organized data available is from Johns Hopkins University.
There are many notebooks that use Kaggle to visualize different data. But in this article, we will learn how to save the dataset directly to the database and run it with SQL, and learn how to use Jupiter Notebook with Python.
💡 How to use Corona datasets on QueryPie
1. Download Datasource and Notebook
The original data can be found at Johns Hopkins University's Center for Systems Science and Engineering (CSSE) GitHub. For our example, we will use the notebook listed on Kaggle. Data sets and notebooks are all arranged here for easy follow-up, so we recommend that you download them before you practice.
2. Run QueryPie and CSV import
First, create a new database in Local named 'Corona'. Name the tables 'confirmed_df', 'deaths_df', and 'recoveries_df' respectively. As mentioned above, all the data can be found here. Go ahead and download these files:
3. Execute SQL
Once you successfully import the tables, run a simple SQL to check out the data. All data in the tables can be viewed at once through the SQL statements below:
SELECT * FROM confirmed_df; SELECT * FROM deaths_df; SELECT * FROM recoveries_df;
4. Run Jupyter
So let's take that dataset and start fully visualizing it. You can click Run Jupyter to send the results of the query to the notebook, but before that make sure to set the Jupiter notebook path. If Python is already installed, check and set the version. For notebook directory, you can either create a new one or specify the path with the existing notebook file (.ipynb).
Next you can set up
QueryPie Jupyter in the Jupyter List, and then import the above received ipynb file (
Coronavirus_Visualization_Prediction.ipynb ). In here, each variable name is the same as the table name. (ex. confirmed_df, deaths_df, recoveries_df)
Run the results from the 3 tables and call the route from the notebook to run jupyter as shown below. You're all set!
5. Check Visualization
Let's break down the data in Jupyter Notebook.
First, check that these files are all properly installed:
sklearn. Then execute the following steps one by one.
(1) Dataset Import
(2) Package import and data check
Check if the data has been called correctly by confered_df. From here, we can see the date, datetime conversion, and forecast date visualized.
plt.figure(figsize=(20, 12)) plt.plot(adjusted_dates, world_cases) plt.plot(future_forcast_dates, linear_pred, linestyle='dashed') plt.title('# of Coronavirus Cases Over Time', size=30) plt.xlabel('Time in Days', size=30) plt.ylabel('# of Cases', size=30) plt.legend(['Confirmed Cases', 'Linear Regression Predictions']) plt.xticks(rotation=50, size=15) plt.show()
As shown in the example chart, you can check data related to the coronavirus from January to February, shown in linear regression based on global probabilities. All charts are implemented in
matplotlib in the corresponding notebook.
Key indicators are about the Covid-19 outbreak/death/treatment rate and regional status over time, and detailed python codes and visualization results can be found directly through notebook execution.
#Why Jupyter notebook on QueryPie?
Although this is just an CSV example, it is actually most accurate to store and view data directly in the DB for minute-to-minute changing data.
Kaggle also states to "Import the data (make sure you update this on a daily basis)", which is why QueryPie directly connects to the changing data source in real time, looks at the table, and then analyzes it on Jupyter. This way, you have the advantage of being able to analyze with fast and accurate data!
✨Want to use SQL and Jupyter Notebook with the same tool? Check out QueryPie!
Thank you CSSE and therealcyberload for providing the original data and notebook. All sources can be found below: