matplotlib.pyplot
Before we jump into the definitions and examples, I want to show you some basic functions of the matplotlib.pyplot subpackage, that we’ll see in the examples below. Here, I am assuming that the matplotlib.pyplot subpackage is imported with an alias plt. Line Plot: a type of plot which displays information as a series of data points called “markers” connected by straight lines. In this type of plot, we need the measurement points to be ordered (typically by their x-axis values). This type of plot is often used to visualize a trend in data over intervals of time - a time series. To make a line plot with Matplotlib, we call plt.plot(). The first argument is used for the data on the horizontal axis, and the second is used for the data on the vertical axis. This function generates your plot, but it doesn’t display it. To display the plot, we need to call the plt.show() function. This is nice because we might want to add some additional customizations to our plot before we display it. For example, we might want to add labels to the axis and title for the plot.
The ETL operations work on data objects provided as operands. An operation returns another data object. As mentioned above, the flow of data is just virtual. That means that when we are filtering the data, the framework might be actually composing a SQL WHERE statement instead of just pulling the data out of the database and filtering them row-by-rown in Python.
Similar with fields in the dataset – if we want to keep just certain columns, why to pass them around all in the first place? Why not to ask only for those that we actually need at the end? That is what Bubbles should do. Therefore the keep_fields() operation just selects certain columns when used in the SQL context.
There might be multiple implementations of the same operation. Which implementation (function) is used is determined at the time of pipeline execution. aggregate() might be in-python row-by-row aggregation using a dictionary or it might be SUM() or AVG() with GROUP BY statement in SQL, depending on which kind of object is passed to the operation.
In the following image you might see how the most appropriate operation is chosen for you depending on the data source. You can also see, that for certain representations the operations are combined together to produce just single data query for the source system.
- plt.title(“My Title”) will add a title “My Title” to your plot
- plt.xlabel(“Year”) will add a label “Year” to your x-axis
- plt.ylabel(“Population”) will add a label “Population” to your y-axis
- plt.xticks([1, 2, 3, 4, 5]) set the numbers on the x-axis to be 1, 2, 3, 4, 5. We can also pass and labels as a second argument. For, example, if we use this code plt.xticks([1, 2, 3, 4, 5], ["1M", "2M", "3M", "4M", "5M"]), it will set the labels 1M, 2M, 3M, 4M, 5M on the x-axis.
- plt.yticks() - works the same as plt.xticks(), but for the y-axis.
How to make Bubble Charts with matplotlib In this post we will see how to make a bubble chart using matplotlib. The snippet that we are going to see was inspired by a tutorial on flowingdata.com where R is used to make a bubble chart that represents some data extracted from a csv file about the crime rates of America by states. I used the dataset provided by flowingdata to create a similar chart with Python. Let's see the code: