What App Should I Develop?
An insight into what app to develop using app data from the apple store. For a young app developer looking to break into a profitable app category.
The Stakeholder
In this case, our stakeholder is an aspiring app developer who needs data driven insight to decide what app to develop. Seeking answers to questions like: What app categories are most popular? What price should I set for my app? How can i maximize my user ratings?
Data Context
The dataset included information covering the apps available on the appstore, dataset from Kaggle.com. One dataset called applestore which included app details of pps available on the appstore: Appnames, Sizing Bytes, Ratings, Supporting Device, Number of supported Languages and more.
One other dataset for the apps description, which explains how the app interfaces with its userbase and what solution it provides, and a glimpse of how it might be perceived on the market. The columns included app unique identity, size in bytes, description and more.
Connecting to the Dataset
Using SQL-Lite online allows for a seamless online connection to a dataset without the need to install dependencies, and setting up database on a local machine.
Exploratory Data Analysis (EDA)
The dataset was combined from four separate tables, because SQL-LIte online does not take in large datasets. The four were combined into one dataset using *Union All* an SQL function that one unified data table from multiple tables.
Exploring the data to reveal issues in data inconsistencies, data errors and outliers before perfoming the actual data analysis.
---combining the multiple tables into one
create table appleStore_description_combined AS
select * from appleStore_description1
union ALL
select * from appleStore_description2
union ALL
select * from appleStore_description3
union ALL
select * from appleStore_description4
---now on to the "EDA" Exploratory Data Analysis
select COUNT(DISTINCT id) as UniqueAppIDs
from AppleStore
---checking the number of unique apps in both tablesAppleStore
select count(DISTINCT id) as UniqueAppIDs
from appleStore_description_combined
--checking for any missing valuesAppleStore in any key fields of the tables
select * from AppleStore
limit 10
---checking for missing values in some key columns
select count(*) as MissingValues
from AppleStore
where track_name is null
or prime_genre is null
or user_rating is null
---cheking for missing values in the app description dataset
select count(*) as MissingValues
from appleStore_description_combined
where app_desc is null
--find out the number of apps per genreAppleStore
--this gives an idea of the distritubion of the app dev genreAppleStore
--- to identify some dominant app genres
select prime_genre, count(*) as NumApp
from AppleStore
group by prime_genre
order by NumApp desc
--getting an overview of the apps ratings
select max(user_rating) as MaxRating,
min(user_rating) as MinRating,
Avg(user_rating) as AvgRating
from AppleStore

Drawing Insights from the Data
Questions to be answered in the analysis: what insights we need to draw from the data
- Explore if paid apps have higher ratings than free apps.
- Find out if apps that support a higher number of languages have higher ratings.
- Explore the genres with low ratings.
- Explore the correlation between the app description length and app user ratings.
- Find the top rated apps in each app category/genre.
--insights or takeaway points for our stakeholderAppleStore
select CASE
when price > 0 then 'Paid'
else 'Free'
end as App_Type, avg(user_rating) as 'AvgRating'
from AppleStore
GROUP by App_Type
--explore if apps that support more languages have higher ratingsAppleStore
select CASE
when lang_num < 10 then '< 10 Languages'
when lang_num BETWEEN 10 and 30 then '10-30 Languages'
else '> 30 Languages'
end as Languages_Bucket, avg(user_rating) as Avg_Rating
from AppleStore
group by Languages_Bucket
order by Avg_rating desc
--Check the genres with low ratingsAppleStore
--The ones that might have a better opportunity to create an app in
select prime_genre, avg(user_rating) as Avg_Rating
from AppleStore
group by prime_genre
ORDER by Avg_Rating ASC
limit 10
--check the correlation btwn app user rating and description lengthAppleStore
Select case
when length(appdesc.app_desc) < 500
then 'Short'
when length(appdesc.app_desc) between 500 and 1000
then 'Medium'
else 'Long'
end as Desc_Length, avg(user_rating) as UserRating
from AppleStore as appst
join appleStore_description_combined as appdesc
on appst.id = appdesc.id
group by Desc_Length
order by UserRating desc
--top rated apps for each genre (what to emmulate when developing the app, top rated app in each genre)
select prime_genre, track_name, user_rating
from ( select prime_genre,
track_name,
user_rating,
rank() over (partition by prime_genre ORDER by user_rating desc,
rating_count_tot desc) as Rank
from AppleStore) as a
where a.Rank = 1
Recomendations from the Analysis
- Paid versus free apps: Paid apps generally achieve slightly higher ratings than free apps, users who pay for an app may perceive the app more valuable and so are much more likely to engange in rating.
- Language support:Apps that support more than 10 languages but less than 30 languages have better ratings compared to those that support more that 30 languages and those that support less than 10 languages.
- High perfoming genres:Categories like Finace and Books have existing apps that have a very low user ratings indicating an opportunity gap for app development as a result of user needs not being fully met.
- App description length:A detailed, well crafted app description is essential for the success of an app as the length of the app's descritpion sets clear expectation from the user. From the data exploration, the length of the description is directly correlated with higher user app ratings.
- The average rating of apps on the appstore is 3.5, and the highest rating is 5. To stand out from the crowd, developer needs to aim for a rating higher than the average user rating.
- Games, with 3862 apps, Seconded by Entertainment with 535 apps, and Education with 453 apps suggest that these three top genres are saturated and therefore showing high competition on the development side of things. This however also indicate a high user demand in these genres.
Let's Connect!
Thank you for checking out my portfolio! I hope you enjoyed exploring my projects. You can also explore more of my data visualizations on Tableau Public. Feel free to view some of my brand identity development projects on Behance.
Tikhala :-)