Back to blog
Meet Benjamin - Pinata’s Data Scientist
What inspired you to pursue Data Science?
Looking back, I think the first experience I had that genuinely piqued my interest in working with data was an Econometrics class (basically programming statistical models using data) I took during my undergrad at UT (Hook ‘Em!). I think it was quite inspiring and interesting to see that it was possible to use data to make useful predictions about the world around us and uncover ideas/concepts we wouldn’t know about without analyzing data.
How did you start your career?
The first job I got out of college was working at an economic consulting firm. Within the first year or so, I started heavily concentrating on analytics and statistical modeling work there. The first year, going from only sort of understanding how to do data work to having done it for a year, really solidified my interest in it. From there, my career and even some hobbies outside of work have largely revolved around learning more and getting better at data science/data engineering. Eventually, I went back to school to get a master’s degree to really dive deeper into more complex topics in data engineering and AI. What I think is exciting, too, is that the recent explosion in open-source AI and Machine Learning (ML) has also really opened the door for what people can do with data in a professional and hobby context.
What are some of the most interesting projects you have worked on?
I think the most interesting project I’ve worked on recently was testing and understanding how machine learning models can be integrated into this sub-discipline of statistics called Causal Inference. Without getting too much into the weeds, causal inference can be used to understand things that are correlated and establish actual cause-and-effect relationships between them, e.g., does providing ChatGPT to workers make them more productive? I think it's really cool to see how ML methods can be integrated to help answer questions like this, and I’m constantly thinking about how to leverage these methods in the context of Pinata’s data.
I also like to experiment with Large Language Models, and it's been really fun to see what kind of tasks they excel at, especially when fine-tuning them for specific use cases.
What have you enjoyed most about working at Pinata?
I really enjoy working with the team here, above all else. Aside from the team, though, I actually quite enjoy how working for a startup challenges me. I think working for Pinata forced me to learn a lot of different things that I frankly wouldn’t have learned had I worked for a larger company. A good example was when I was first setting up our data stack to run in the cloud, there was a lot of pain but also learning that came from having to take a project that big from end to end. In other words, writing the first lines of SQL code to having to configure the entire project to run on its own in the cloud. This is probably kind of dorky, but I derive a lot of joy from seeing our data stack running well every day.
How do you see the role of data science evolving in the next few years?
I think I have two big takeaways in terms of how I see data science evolving in the next few years. The first is already happening, and I think lots of others have said this, but as data science as a career starts to mature, I think we will see the role of data scientist (an arguably vague job title) break out further into several more defined roles 1) data engineer, 2) data analyst, 3) machine learning engineer, 4) data scientist (statistical modeling). This is because as data and AI systems grow more complex, specialization will be helpful, and the data science job market is starting to realize that each of these jobs has its own skillset (though there is some overlap). The funny thing is because Pinata is a startup, I actually do a mix of all four roles, which I think is super cool, but I’m more of an exception.
The other takeaway is that while the ‘AI bubble’ we are currently in will eventually burst, I do think companies will start thinking about ways in which more powerful models like LLMs can be used to automate very repetitive tasks where accuracy is not super important. This will probably only start occurring, though, once these models become cheaper, simpler, and more lightweight to run. However, I think it is easy to imagine a person using an AI tool, for example, to read/search through thousands of natural language documents, looking for specific facts/information, and then having that same person validate the output for errors, a task that is pretty common in the legal and accounting industries.