How to Become a Data Engineer in 2021

11-Jan-2021

Data Engineering skills 2021.

The demand for data engineers is growing rapidly. According to the demand has increased by 45% in 2019. The median salary for Data Engineers in SF Bay Area is around $160k. So the question is: how to become a data engineer?
What Data Engineering is
Data engineering is closely related to data as you can see from its name. But if data analytics usually means extracting insights from existing data, data engineering means the process of building infrastructure to deliver, store and process the data. According to The AI Hierarchy of Needs, the data engineering proccess is located at the very bottom: Collect, Move & Store, Data Preparation. So if your organization wants to be data/AI-driven then they should hire/train data engineers.

But what data engineers actually do? The amount of data is growing rapidly every single day. We are contemplating the new era where everybody can do a content from their mobile phone and other gadgets. Even small devices are connected to the Internet. Data engineers from the past were responsible for writing complex SQL queries, building ETL (extract, transform & load) processes using big enterprise tools like Informatica ETL, Pentaho ETL, Talend etc. But now the market demands more broader skillset. If you want to work as a data engineer you need to have:

  • Intermediate knowledge of SQL and Python

  • Experience working with cloud providers like AWS, Azure or GCP

  • The knowledge of Java/Scala is a big plus

  • Understading SQL/NoSQL databases (data modeling, data warehousing, performance optimization)

The skillset is very similar to what Backend engineers usually know. In reality if an organization is growing in terms of data the ideal candidate to transform into data engineer is a backend engineer.

The particular technologies and tools could differ due to company size, data volumes and data velocity. If we look at the FAANG for example, they usually require:

  • Knowledge of Python, Java or Scala

  • Experience working with Big data tools like Apache Hadoop, Kafka and Spark

  • Solid knowledge of algorithms and data structures

  • Undestanding of distributed systems

  • Experience with Business Intelligence tools like Tableau, QlikView, Looker or Superset

Data Engineering skills 2021
Data Engineer's Skillset
Data engineering is an engineering area that is why the knowledge of computer science fundamentals is required, especially the understanding of most popular algorithms and data structures (Hello Mr. Cormen!).

Because data engineers deal with the data on a daily basis understading how databases work is a huge plus. For example, the most popular SQL databases like SQLite, PostgreSQL, MySQL use B-Tree data structure under the hood.

