Predicting Travel Fares with Machine Learning | Matheus de Souza e Silva | Oct 2024

SeniorTechInfo
3 Min Read
Matheus de Souza e Silva

Unlocking the Secrets of Ride Fare Prediction with Data Analysis


In this project, we dive into a dataset of ride data, aiming to predict the fare amount based on various variables such as distance traveled, number of passengers, time of day, and more. Accurate fare predictions can help transportation companies optimize their services and provide a better experience for their users.

The main goal is to develop a predictive model that utilizes this data and provides reliable estimates for ride fare.

The first step involves loading the data and checking the integrity of the information. Using the Pandas library, we explore the first few rows of the dataset, identify any missing values, and apply appropriate data cleaning when necessary.

import pandas as pd

# Load the dataset
df = pd.read_csv(‘uber.csv’)

# View the first few rows of the dataset
df.head()

After the initial analysis, we noticed the presence of some variables that needed treatment, such as null data and inconsistent entries. A cleaning process was applied to ensure that the dataset was ready for analysis and modeling.


In the EDA phase, we delved into the available variables to understand their relationships with the fare amount. Scatter plots were generated to observe the correlations between the key variables and the fare.

Variables Used:

  • distance_km: The distance traveled in the trip.
  • passenger_count: The number of passengers on the trip.
  • hour: The time the trip was taken.
  • distance_from_center: The distance from the city center.
  • is_holiday: Indicator of whether the trip day was a holiday.
  • is_weekend: Indicator of whether the trip took place on a weekend.
  • season: The season in which the trip occurred.

Here is an example of a scatter plot generated during the exploratory analysis:

    
        
# Creating scatter plots
fig, axes = plt.subplots(3, 2, figsize=(15, 12))

for idx, feature in enumerate(features[:6]):
row, col = divmod(idx, 2)
axes[row, col].scatter(df_filtered[feature], df_filtered['fare_amount'], alpha=0.5)
axes[row, col].set_xlabel(feature)
axes[row, col].set_ylabel('fare_amount')
axes[row, col].set_title(f'{feature} vs Fare Amount')

plt.tight_layout()
plt.show()


Through this visual analysis, we were able to observe some interesting trends, such as the impact of distance and time on the fare amount.

</

Share This Article
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *