Instagram System Design
Overview
If you are on social media, then there is a high probability that you have used instagram at-least once in your life. Have you ever wondered how the app works so flawlessly and if you have to build a similar app how will you proceed? In this article we will be covering how we can design an app similar to instagram. Learn more about system design with the complete course.
Requirements
Functional
- User should be able to upload an image/video on his profile.
- User should be able to see uploads of other users followed the user.
- User should be able to follow other users.
- User can perform search for an image/video based on title.
Non Functional
- The user feed latency should be low.
- We are okay with eventual consistency as uploaded image can be shown to another user after few milliseconds.
- Our app needs to be highly available.
- The Data store we will be using for storing image/video should be reliable and data should not be lost.
Additional Requirements
- Users can add tags to a photo/video.
- Users can put comments on a post.
- Users can search photo based on tags.
Capacity Estimation
Traffic estimation
Let’s assume we have 500 million daily active users and we get roughly 5 million uploads per day.
Total number of uploads ~ 57 per sec
Storage estimation
Let’s assume average upload size of 200KB.
Then total storage required for 1 day is 200*5 ~ 1 TB per day.
API
- Upload Image
POST: /imageRequest {
image: MultipartFile
title: String}Response {
imageId: String}
2. Get feed
GET: /feed?pageNumber={pageNumber}&limit={limit}Response {
feeds: Feed[]}Feed: {
imageId: String
imageUrl: String
title: String}
3. Follow user
POST: /follow/{followingUserId}
4. Search image
GET: /search?keyword={searchKeyword}Response: {
feeds: Feed[]}Feed: {
imageId: String
imageUrl: String
title: String}
Design Consideration
- Data store for storing uploaded image — Our System is read heavy, so we need a data store that can quickly fetch the uploaded image and render on user application. Couple of things that needs to be kept in mind is 1) The data store should be reliable as we do not want user’s uploaded image to get lost. 2) User can upload as many images as they so the data store should be scalable to handle billions of images. 3) Latency should be low when retrieving the photos. We can consider an object storage to store the uploaded images by user something like AWS S3. There are other types of storage as well like file storage and block storage but considering the above factors object storage will be a right fit for our design as it gives low read latency and efficient management of huge number of records.
- Data store for storing user data and its uploads. — Now we have a data store to store the uploaded image by users. We need a database to store the metadata of user uploads and user data. Things to keep in mind while deciding data store — 1) The database should be highly available. 2) It should have low read latency as our system is ready heavy. 3) It should be scalable enough to handle billions of record. 4) It should be reliable and should support sharding and replication. Considering the above factors we do not see a requirement for relational database. We can go for a key value based NoSQL database. For this system design we can choose AWS DynamoDB to store user data and image uploads metadata.
Database Design
We need the following tables to store our data —
- Table to store user data
userId: string[HashKey]
name: string
emailId: string
creationDateInUtc: long
2. Table to store follower data
followingUserId_followerUserId: string [HashKey]
followingUserId: string [RangeKey]
followerUserId: string
creationDateInUtc: long
We can’t choose followingUserId as a hashKey because it can create an unbalanced partition since there can be user’s who will be having millions of follower’s. These type of user’s are known as hot users. Hence, to maintain a balanced partition we can choose a combination of followingUserId and followerUserId as a hashKey
3. Table to store user uploads
uploadId: string[Hashkey]
userId: string[RangeKey]
imageLocation: string
uploadDateInUtc: long
caption: string
4. Table to store the user feed data
userId: string[Hashkey]
uploadId: string
creationDateInUtc: long[RangeKey]
High Level Design
Component Details
- Client — These will be the mobile/desktop application that will connect to backend servers via REST API’s defined above.
- Load balancer — We will use load balancer’s to distribute the traffic between different servers. This will make our System more available and in case a server goes down behind a load balancer, load balancer can distribute the traffic on different servers.
- Image Service — Image service is responsible for providing API’s to upload image and get image meta data. The meta data API will return the image path in s3 which will be used by clients to load image on their application.
- S3 — We are using object storage to store the uploaded images by users. AWS S3 is scalable and cheap object storage that we can use here. We can integrate it with AWS CloudFront so that the images can be rendered on user application much faster.
- CloudFront — Amazon CloudFront is a content delivery network (CDN) service built for high performance, security, and developer convenience. With the help of CloudFront the images will be rendered faster on user application.
- Image DDB — We use AWS Dynamo DB to store the user uploads image metadata. We have discussed about this in above section.
- SNS — On every user upload we are publishing a notification with the help of AWS Simple Notification Service. This will be helpful in other processing like monitoring, feed generation, analytics, etc. Different SQS can subscribe to this SNS for listening the events.
- SQS — We use AWS Simple Queue Service that will subscribe to upload event SNS and the feed generation service will listen to this SQS for processing the events.
- Feed generation Service — This service is responsible for user feed generation. It will listen to the user upload events via SQS and start the process for user feed generation. This service will handle millions of events and there can be a separate discussion about the Low Level Design for this service. We will cover that in future article.
- Feed DDB — We are using Dynamo DB to store the user feed data. The feed generation service will interact with DDB to update the user feeds.
- Redis Cache — For keeping the read latency low for our users, we implement a caching layer in between our feed generation service and DDB. When a request will come to fetch a user’s feed, it will first check in the redis cache, if not available then it will fetch it from DDB and return the response.
Bottlenecks & Future Extensions
- Create a user explore page where uploads from public accounts can be shown to user’s page based on user’s preference and past data.
- User feed generation. The current user feed generation has a bottleneck, it currently listens to all update event and update the feed for all the follower’s of that particular. This will create an issue when there is an update on hot user(A user which has huge number of follower’s). We will discuss this in future articles.
- More advance searching based on different parameters like tags, location, caption, title, etc.
Resources
- Design Patterns: Elements of Reusable Object Oriented Software
- Designing data intensive applications
- How to prepare for the System Design interview in 2024.
Hope you find this article helpful, keep watching the space and follow for more such future articles.
Also Checkout the finance management app, it’s simple and easy to use — Wallet