Learning

open
close

Review of Python Functions

June 28, 2025 | by Bloom Code Studio

This appendix provides a summary of Python functions used in this textbook. The intent is to provide students with a cross-reference of Python commands that includes a description of the Python functions, general syntax for usage, and a link to the section where the function is first used in the text.

Please note this is a very high-level description of these functions. Many functions require specific libraries to be installed. For more details on Python functions, syntax, and usage, please refer to the Python documentation posted online.

Python FunctionDescriptionSyntaxFirst Reference
What Are Data and Data Science?
print()Prints a specified message or specified values to the screen or other output deviceprint(“text”)
print(x, y)
Python Basics for Data Science
pd.read_csv()Loads data from a CSV (comma-separated values) file and stores in a DataFramepd.read_csv
(path_to_csv datafile)
Python Basics for Data Science
DataFrame.describe()Returns a table with basic statistics for a dataset including min, max, mean, count, and quartilesDataFrame.describe()

Where:
DataFrame is the name of the
DataFrame.
Python Basics for Data Science
DataFrame.iloc[]Allows access to data in a DataFrame using row/column integer-based indexes.DataFrame.iloc[row, column]

Where:
DataFrame is the name of the
DataFrame.
Python Basics for Data Science
DataFrame.loc[]Used to access a group of rows and columns by labels or a Boolean arrayDataFrame.loc[criteria]

Where:
DataFrame is the name of the
DataFrame.
Python Basics for Data Science
Plt.scatter()Generates a scatterplot for (x, y) dataplt.scatter(x_data, y_data)Python Basics for Data Science
Plt.title()Specifies a title for a chartplt.title(“Title”)Python Basics for Data Science
Plt.xlabel()Specifies a label for the x-axisplt.xlabel(“x-axis label”)Python Basics for Data Science
Plt.ylabel()Specifies a label for the y-axisplt.ylabel(“y-axis label”)Python Basics for Data Science
Plt.xlim()Specifies limits to use for x-axis numberingplt.xlim(lower, upper)Python Basics for Data Science
Plt.ylim()Specifies limits to use for y-axis numberingplt.ylim(lower, upper)Python Basics for Data Science
Collecting and Preparing Data
pd.read_html()Read HTML table from a web page and convert into a DataFramepd.read_html(URL)Web Scraping and Social Media Data Collection
pd.to_numeric()Converts strings or other data types to numeric valuespd.to_numeric
(column_name)
Web Scraping and Social Media Data Collection
len()Returns the length of an objectlen(object)Web Scraping and Social Media Data Collection
re.findall()Returns all non-overlapping matches of a specified pattern in a stringre.findall(pattern, string)Web Scraping and Social Media Data Collection
re.search()Checks if a specified pattern appears in a stringre.search(pattern, string)Web Scraping and Social Media Data Collection
Descriptive Statistics: Statistical Measurements and Probability Distributions
binom.pmf()Calculates the probability mass function (PMF) for a binomial distribution. It gives the probability of having exactly x successes in n trials with success probability p.binom.pmf(x, n, p)

Where:
x is the number of successes in
the experiment,
n is the number of trials in the
experiment,
p is the probability of success.
Discrete and Continuous Probability Distributions
round()Rounds a numeric result to a specified level of precisionround(number, digits)Discrete and Continuous Probability Distributions
poisson.pmf()Calculates probabilities associated with the Poisson distributionpoisson.pmf(x, mu)

Where:
x is the number of events of
interest,
mu is the mean of the Poisson
distribution.
Discrete and Continuous Probability Distributions
norm.cdf()Calculates probabilities associated with the normal distribution (returns the area under the normal probability density function to the left of a specified measurement)norm.cdf(x, mu, std)

Where:
x is the measurement of interest,
mu is the mean of the normal
distribution,
std is the standard deviation of
the normal distribution.
Discrete and Continuous Probability Distributions
Inferential Statistics and Regression Analysis
t.ppf()Generates the value of the t-distribution corresponding to a specified area under the t-distribution curve and specified degrees of freedomt.ppf
(area to left, degrees of
freedom)
Statistical Inference and Confidence Intervals
bootstrap()Performs bootstrap process to generate confidence intervalbootstrap
(data, statistic,
confidence_level,
number_resamples)
Statistical Inference and Confidence Intervals
norm.interval()Calculates confidence interval for the mean when population standard deviation is known, given sample mean, population standard deviation, and sample size (uses normal distribution). Note: Standard error is the standard deviation divided by the square root of the sample size.norm.interval
(conf_level, sample_mean,
standard_error)
Statistical Inference and Confidence Intervals
t.interval()Calculates confidence interval for the mean when population standard deviation is unknown, given sample mean, sample standard deviation, and sample size (uses t-distribution). Note, standard error is the standard deviation divided by the square root of the sample size.t.interval
(conf_level,
degrees_freedom,
sample_mean,
standard_error)
Statistical Inference and Confidence Intervals
proportion_confint()Calculates confidence interval for a proportion (uses normal distribution)proportion_confint
(success, sample_size,
alpha)
Statistical Inference and Confidence Intervals
ttest_1samp()Returns the value of the test statistic and the two-tailed p-value for a one-sample hypothesis test using the t-distributionttest_1samp
(data_array,
null_hypothesis_mean)
Hypothesis Testing
ttest_ind_from_stats()Returns the value of the test statistic and the two-tailed p-value for a two-sample hypothesis test using the t-distributionttest_ind_from_stats
(sample_mean1,
sample_standard_deviation1,
sample_size1, sample_mean2,
sample_standard_deviation2,
sample_size2)
Hypothesis Testing
np.array()Creates a numerical array from a list-like objectnp.array(object)Correlation and Linear Regression Analysis
pearsonr()Calculates the value of the Pearson correlation coefficient rpearsonr
(x_data, y_data)
Correlation and Linear Regression Analysis
linregress()Generates a linear regression model and provides slope, y-intercept, and other regression-related outputlinregress
(x_data, y_data)
Correlation and Linear Regression Analysis
f_oneway()Returns both the F test statistic and the p-value for the one-way ANOVA hypothesis testf_oneway
(Array1, Array2, Array3, …)
Analysis of Variance (ANOVA)
Time Series and Forecasting
plot()Generates a time series plotplot(dataframe)Introduction to Time Series Analysis
rolling()Provides rolling window calculationsrolling
(window=window)
Time Series Forecasting Methods
mean()Computes the average of a datasetmean(dataset)Time Series Forecasting Methods
diff()Computes the first-order difference of data in a windowdiff(dataframe)Time Series Forecasting Methods
plot_acf()Plots the ACF (autocorrelation function) for a time series, up to lag LPlot_acf
(time_series_data, lags=L)
Time Series Forecasting Methods
STL()Decomposes a time series with known period P into its componentsSTL
(time_series_data,
period=P)
Time Series Forecasting Methods
ewm()Performs exponential moving average (EMA) smoothingewm(dataframe)Time Series Forecasting Methods
adfuller()Performs the Augmented Dickey-Fuller (ADF) test, which is a statistical test for checking the stationarity of a time seriesadfuller
(time_series_data)
Time Series Forecasting Methods
ARIMA()Fits an ARIMA(p, d, q) (AutoRegressive Integrated Moving Average) model to time series dataARIMA
(time_series_data,
order=(p, d, q))
Time Series Forecasting Methods
Decision-Making Using Machine Learning Basics
LogisticRegression()Creates a logistic regression modelLogisticRegression()Classification Using Machine Learning
model.fit()Trains a machine learning model on a given datasetmodel.fit
(feature_matrix,
target_vector)
Classification Using Machine Learning
KMeans()Sets up a k-means clustering model (Use model.fit() to fit the model to a dataset.)KMeans(n_clusters=k)Classification Using Machine Learning
DBSCAN()Sets up a DBSCAN (Density-Based Spatial Clustering of Applications with Noise) model (Use model.fit() to fit the model to a dataset.)DBSCAN(options)Classification Using Machine Learning
confusion_matrix()Used to visualize the performance of a model by comparing actual and predicted valuesconfusion_matrix
(target_values,
predicted_values)
Classification Using Machine Learning
LinearRegression()Fits a linear regression model to dataLinearRegression()
.fit(feature_matrix,
target_vector)
Machine Learning in Regression Analysis
predict()Used on trained machine learning models to generate predictions for new data pointspredict(feature_matrix)Machine Learning in Regression Analysis
DecisionTreeClassifier()Sets up a decision tree model (Use model.fit() to fit the model to a dataset.)DecisionTreeClassifier
(options)
Decision Trees
ens.RandomForestRegressor()Sets up a random forest model (Use model.fit() to fit the model to a dataset.)ens.RandomForestRegressor
(options)
Other Machine Learning Techniques
GaussianNB()Set up a Naïve Bayes classification model (Use model.fit() to fit the model to a dataset.)GaussianNB()Other Machine Learning Techniques
Deep Learning and Artificial Intelligence (AI) Basics
Perceptron()Sets up a perceptron model (Use model.fit() to fit the model to a dataset.)Perceptron()Introduction to Neural Networks
train_test_split()Splits dataset randomly into train and test subsets, using a proportion of P of the data for the test settrain_test_split
(input_data_arrays,
target_data, test_size=P)
Introduction to Neural Networks
StandardScaler()Used to standardize features by removing the mean and scaling to unit varianceStandardScaler()Introduction to Neural Networks
accuracy_score()Calculates the accuracy of a classification model as the ratio of the number of correct predictions to the total number of predictionsaccuracy_score
(y_true, y_predicted)
Introduction to Neural Networks
scaler.fit_transform()Fits a scaler to the data and then transforms the data according to the fitted scalerscaler.fit_transform(array)Introduction to Neural Networks
scaler.transform()Applies a previously fitted scaler to new datascaler.transform(array)Introduction to Neural Networks
tf.keras.Sequential()Creates a linear stack of layers for building a neural network modeltf.keras.Sequential
(layers, additional
options)
Backpropagation
model.compile()Used to configure the learning process of a neural network model before trainingmodel.compile
(optimizer, loss, metrics)
Backpropagation
Visualizing Data
boxplot()Creates a box-and-whisker plotplt.boxplot(array)Encoding Univariate Data
hist()Creates a histogramplt.hist (array)Encoding Univariate Data
plot()Creates 2D line plots such as a time series graphplt.plot
(x_data, y_data)
Graphing Probability Distributions
bar()Creates a bar chartplt.bar
(x_array, heights)
Graphing Probability Distributions
imshow()Displays an image on a 2D regular raster, such as a heatmapplt.imshow(array)Geospatial and Heatmap Data Visualization Using Python
heatmap()Creates a heatmap visualizationsns.heatmap(array)Geospatial and Heatmap Data Visualization Using Python
colorbar()Adds a colormap to a figureplt.colorbar()Multivariate and Network Data Visualization Using Python
corr()Calculates the pairwise correlations of columns in a DataFramedataframe.corr()Multivariate and Network Data Visualization Using Python
add.subplot()Adds a subplot to a figure stored in figfig.add.subplot
(position)
Multivariate and Network Data Visualization Using Python
ax.scatter()Creates a scatterplotax.scatter
(x_data, y_data)
Multivariate and Network Data Visualization Using Python
Reporting Results
plot_tree()Creates a visualization of a decision treeplot_tree
(estimator, feature_names)
Validating Your Model
DataFrame.info()Provides a concise summary of a DataFrame’s structure and contentDataFrame.info()Validating Your Model
DataFrame.drop()Removes rows or columns from a DataFrameDataFrame.drop
(labels, axis=rows_columns)
Validating Your Model
score()Evaluates the performance of a trained model on a given datasetmodel.score
(feature_matrix,
true_labels)
Validating Your Model
dt.get_depth()Retrieves the depth of the decision tree, dtdt.get_depth()Validating Your Model
cross_val_score()Evaluates a model’s performance using cross-validationcross_val_score
(estimator, feature_matrix,
target_variable)
Validating Your Model
GridSearchCV ()Search for the best parameters for a specified estimator, with k-fold cross-validationGridSearchCV
(estimator, parameters, k)
Validating Your Model

Table D1