Big Data and Analytics

The sheer volume of data in business is growing exponentially across all industries. Over the next decade, the amount of digital information is predicted to increase to 35 trillion gigabytes – much of it coming from social networks, sensor networks, and internet search. All of this “Big Data” – which is transforming science, engineering, medicine, healthcare, and finance – is said to be the next frontier in innovation. Companies now face the challenge of processing these large amounts of unstructured data quickly enough to make it useful and relevant to their business.
The Big Data toolsets used by ISVs and software-enabled businesses (SEBs) differ from those more frequently used by enterprises.

For example, enterprise Big Data systems focus on data consistency and data cleansing, where ISVs and SEBs are more tolerant of inconsistencies. Enterprise Big Data systems also tend to put a stronger emphasis on licensed software and hardware-based solutions rather than open source and the public cloud. Enterprise database companies are making aggressive strides to extend their reach to manage tens of terabytes of data, approaching Big Data scale. Reporting: BIRT, Jaspersoft Data mining, analytics, and modeling: R, SAS, Microstrategy, Pentaho, SciPy, BI Velocity ETL and data management tools: Informatica, Kettle, IBM DataStage, Power Center, MS SSIS, Oracle PL/SQL, TalendOS, Sqoop Log acquisition, processing, and analysis: Flume, Splunk, graphite, logstash Work scheduler: Oozie,

Distributed databases: MongoDB, Couchbase / CouchDB, Casandra

Distributed file systems: HDFS, S3

Data warehouse: Hive/Pig, Amazon Redshift

Distributed data processing: Hadoop

Search & indexing: Solr / Lucene, Elasticsearch, custom data crawlers