Minggu, 25 November 2012

BigData: HDFS FeatureClass ETL and MapReduce GPTool

This post is dedicated to my Esri colleagues Ajit D. and Philip H. for their invaluable help.This is work in progress - but I've put a good dent in it that I would like to share it with you.  In this post, we will go through a complete cycle, where from ArcMap, we will:Export a FeatureClass to an HDFS folderRegister that folder as a Hive tableRun command line Hive queriesExecute Hive queries...

Jumat, 23 November 2012

BigData: Cloudera Impala and ArcPy

So at the last Strata+Hadoop World, Cloudera introduces Impala- I downloaded the demo VM, and install the TPC-DS data set (read the impala_readme.txt once the VM starts up) and tested some of the queries. Was pretty fast and cool - As of this writing, UDFs and SerDes are missing from this beta release, so I cannot do my Spatial UDF operators, nor can I read JSON formatted HDFS record :-(One of the...

Senin, 08 Oktober 2012

Streaming Big Data For Heatmap Visualization using Storm

A record number of tweets was set during the 2012 Presidential debate. If you wondered how technologically this happened, then Event Stream Processing is your answer.Actually, Twitter open sourced such an implementation called Storm. Pretty impressive piece of technology! So, I wanted to try it out with a "geo" twist.To get started, I recommend that you read "Getting Started with Storm".  Here...

Senin, 24 September 2012

Processing Big Data with Apache Hive and Esri ArcPy

Data Scientists, if you are processing and analyzing spatial data and are using Python, then ArcPy should be included in your arsenal of tools and ArcMap should be utilized for geo spatial data visualization.  Following the last post where I extended Apache Hive with spatial User Defined Functions (UDFs), in this post I will demonstrate the usage of the "extended" Hive within Python and how to...

Senin, 17 September 2012

Big Data, Spatial Hive, Sequence Files

Following the last post, where we used Pig to analyze data stored in HDFS, in this post we will be using Hive and spatially enabling it for geo analysis. Hive enable you to write SQL like statements in a language called HiveQL that Hive converts to a MapReduce job that is submitted to Hadoop for execution. Again, if you know SQL, then learning HiveQL is very easy and intuitive.  Hive is...

Selasa, 28 Agustus 2012

Big Data,Spatial Pig,Threaded Visualization

This post is PACKED with goodies - One of the ways to analyze large sets of data in the Hadoop File System without writing MapReduce jobs is to use Apache Pig. I highly recommend that you read Programming Pig, in addition to the online documentation. Pig Latin, the scripting language of Pig, is easy to understand, write and more importantly to extend. Since we do spatial stuff, the first...

Kamis, 23 Agustus 2012

MongoDB + Spring + Mobile Flex API for ArcGIS = Harmonie

I've used MongoDB on a project for the City of Chicago with great success.  I was impressed with the fact that we can store JSON documents in one giant collection, scale horizontally by just adding new nodes, the plethora of language APIs (Java,AS3) that can talk to it, run MapReduce tasks, and my favorite is that you can create a true spatial index on a document property.  This is not some...
 

Virush-SGB Copyright © 2012 Fast Loading -- Powered by Blogger