In a recent blog post, Netflix has shared insights into its ongoing efforts to enhance cloud resource efficiency, highlighting some of the challenges it faces in managing its significant usage of Amazon Web Services (AWS). The post, authored by senior analytics engineer Jennifer H and data professional Pallavi Phadnis, reveals that even a tech giant like Netflix is striving to gain comprehensive control over its cloud expenditure.

The blog outlines how Netflix's engineering teams are empowered with self-service tools to provision applications in the cloud. However, to facilitate better resource utilisation and understanding of associated costs, Netflix operates a dedicated Platform Data Science Engineering (DSE) team. This team's primary focus is to assist engineering units in making informed decisions regarding resource usage.

In pursuit of this goal, the Platform DSE team has developed two essential tools: the Foundational Platform Data (FPD) and the Cloud Efficiency Analytics (CEA). The FPD tool creates a centralised data layer that maintains a consistent model and standardised methodology for processing data linked to the platform. The CEA tool, built on the FPD foundation, provides an analytics layer that delivers time series efficiency metrics tailored to various business use cases.

The FPD tool aggregates data from applications such as Apache Spark, which logs the allocation time of processing cores and the volume of data processed. Subsequently, CEA utilises this inventoried data, as well as ownership and usage information, applying specific business logic to generate detailed cost and ownership insights across diverse granularities.

The complexity of the datasets generated at Netflix is attributed to the extensive infrastructure and the specific features of its platforms. Jennifer H and Pallavi Phadnis noted the intricacies involved, mentioning that "services can have multiple owners, cost heuristics are unique to each platform, and the scale of infra data is large.” They also highlighted the customisation requirements inherent to Netflix’s platforms, which results in a constant workload for the Platform DSE team, including routine audits.

The challenges of achieving data completeness and correctness are compounded by factors such as upstream latency and the transformations necessary for the data to be adequately prepared for analysis. As such, the development of both the FPD and CEA tools is ongoing, with the team aspiring to achieve nearly comprehensive visibility into costs within the forthcoming year.

In a forward-looking statement, the post hints at Netflix’s plans to adopt more proactive methodologies, signalling a shift towards predictive analytics and machine learning to optimise usage and identify cost anomalies better. This is notable, given that Netflix has acknowledged its current difficulties in fully controlling cloud spending, suggesting that it is not unique in facing these challenges within the industry.

Source: Noah Wire Services