Tag: python

Spark with Python

Spark is a cluster computing framework that uses in-memory primitives to enable programs to run up to a hundred times faster than Hadoop MapReduce applications. Spark applications consist of …

Running Pig

Pig contains multiple modes that can be specified to configure how Pig scripts and Pig statements will be executed. Execution Modes Pig has two execution modes: local and MapReduce. Running …

Pig and Python

Pig is composed of two major parts: a high-level data flow language called Pig Latin, and an engine that parses, optimizes, and executes the Pig Latin scripts as a …

Working with Snakebite in Python

Snakebite is a Python package, created by Spotify, that provides a Python client library, allowing HDFS to be accessed programmatically from Python applications. The client library uses protobuf messages …

Interacting with HDFS

Interacting with Hadoop Distributed File System (HDFS) is primarily performed from the command line using the script named hdfs. The hdfs script has the following usage: $ hdfs COMMAND …

The Zen of Python

For a long time, the programming language Perl was the mainstay of the Internet. Most interactive websites in the early days were powered by Perl scripts. The Perl community’s …

Working with comments in Python

Comments are an extremely useful feature in most programming languages. Everything you’ve written in your programs so far is Python code. As your programs become longer and more complicated, …

Numbers and Floats in Python

Numbers are used quite often in programming to keep score in games, represent data in visualizations, store information in web applications, and so on. Python treats numbers in several …