Originally published in my blog.
“Tell me, what goes on in your day at work?” is one of my favorite questions to ask during interviews. In this article, I will answer this question for myself. I will discuss what it is like to be a data scientist (from my perspective) at work. I will start by providing some context about where I work and what the structure of my work is like, then I will dive into the details of my workday.
I am part of the Personalization Team, which focuses on Recommendation Systems. Our goal is to solve the classic recommendation problem of “what to serve to whom and when”. The team consists of a little more than a dozen folks, and we handle things end-to-end, with some help from our colleagues from the central engineering team (our platforms team).
I work as a Lead Scientist, so I am also responsible for some managerial tasks. I own the recommendation systems for a particular class of devices. Also, I have two more data scientists working with me. Being in a startup-like environment, we are pretty much independent — from problem formulation, data analysis, model building, deployment, monitoring and metrics. While the engineering team integrates the recommendation services into their serving layer, we collaborate with the Product teams to discuss, brainstorm, and formulate solutions. The Platform team helps us out on some tasks — from automating our model building and deployments to giving us libraries for easier infrastructure management and to frameworks that help us focus more on modeling tasks.
This week I’m on-call. So modeling tasks, data crunching, etc. are something I will avoid since the outages can demand frequent context switches. I usually use this time to clear tech debt and polish existing systems, models, and components. So let’s begin.
The first thing I look at is the Opsgenie alerts. While our recommendation services are critical for the business, the setup with Serving is such that if our services fail, there is a specific rule-based scoring mechanism that is used as a fallback scoring system. The end-user experience is not affected too drastically. The product mainly runs on Indian devices, so ‘2 am on-call’ supports are not expected or needed. So, I check for alerts and handle them in the morning when I sit at my desk, unless something really bad happens, of course. We have set up three main types of alerts — pipeline failures, prediction service latencies, and the infrastructure issues for these deployments. We’ve got a bunch of spark pipelines that run periodically processing hourly or daily data. They could be pipelines required by our models, pipelines that power our dashboards or pipelines that create generic dataset that downstream teams consume. I’d look at what pipelines caused the alerts and take necessary action to rectify the issues and backfill data where required. Sometimes some of the pipelines would be alien to me, so I just redirect the alert to the pipeline owner — this is easy for us since each and every resource is labeled with a
maintainer tag. These are spark jobs and we have a lot of data, a lot. Hundreds of millions of daily active users and their user engagement data is huge. We need clusters of machines to handle such big data. Occasionally, some pipeline goes down.
The next item I focus on is our metrics dashboards. We track a lot of business metrics for each model deployed — total number of impressions, overall duration spent on the platform, opt-outs, revenue, all types of engagements, etc. A quick check will tell if things are looking good or not. For example, impressions dropping or opt-outs increasing are red flags. If I’m in the middle of an A/B experiment (we almost always are in the middle of this), then decisions are made on which variant to deprecate and which one to scale up basis the above metric. These dashboards are built primarily on Superset and Tableau. They could fail to update due to the pipeline failures. The queries could also fail due to an overloaded Trino cluster. We have to handle this mess.
If things look good, I move on to our JIRA board’s current sprint plan.
We function in sprints spanning two weeks. There are a lot of debates on the usage of sprints for data science-y stuff due to the indeterministic nature of what we do. Nevertheless, we use them primarily to track items we are working on and to give our leaders some idea of how the team’s bandwidth looks. I plan the activities and tasks for my team, so this usually gives me a sense of what they are working on, or if there’s something I need to help with. After the morning standup where we discuss updates and blockers on these items, work mode starts.
Currently, I am working on enhancing a streaming service that processes items in real-time. This is a completely asyncio-based solution that annotates items with a ton of metadata and embedding for downstream tasks. This is one of the primary source-of-truth pipelines that is used org-wide. So enhancements like additional tag extractions, monitoring improvements, etc. are some tasks I usually pick up. The latest addition is to enable this service to sink data into Redis caches and Vertex Feature Store entities (another low latency cache) to ensure items are available for scoring (recommending) as soon as they are published, ensuring the latest content recommendations to the end-users. (The real-time prediction services we use rely on these caches).
After lunch, I usually attend meetings or have one-on-one discussions with my team members to check on their progress and see if they need help with any tasks. If there are no pressing issues or meetings, I continue working on my current tasks. I am also part of an ongoing discussion for re-architecting the current recommendation systems to be compatible with an event-based serving system, to be more real-time in nature. This involves discussions around latency, infrastructure decisions, and possible hiccups that we have to anticipate. People are also incepting some ideas for edge-ml that I can do a proof-of-concept for, but more of that when I get to it. While I refrain from modeling and data analysis tasks, I try to pass down my thinking to the team so that they could take it forward. The latest has been to rethink how to define item popularity and to quantify the ‘trending’ nature of the content that we serve. This coupled with a more robust way to define user-item category affinities that can also be applied to users that we don’t have a lot of data for — sparse users. Also, a recommendation model using this ‘category affinity’ approach. I’ll probably write an extensive article about this one.
Another round of health checks, and I’m done.