Trustie - 确实激发创新

开源项目社区	当前位置 :
www.trustie.net/open_source_projects	主页 > 开源项目社区 > Apache Spark

Apache Spark

9	0	6711
贡献者	讨论	代码提交

概述

Apache Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write.

To run programs faster, Spark provides primitives for in-memory cluster computing: your job can load data into memory and query it repeatedly more rapidly than with disk-based systems like Hadoop.

To make programming faster, Spark offers high-level APIs in Scala, Java and Python, letting you manipulate distributed datasets like local collections. You can also use Spark interactively to query big data from the Scala or Python shells.

Spark integrates closely with Hadoop to run inside Hadoop clusters and can access any existing Hadoop data source.

创建时间：2014-05-06 16:05

项目来源：http://www.ohloh.net/p/apache-spark

标签:

streamingdata

sql

apache

streaming

distributed

Java

hadoop

scala

clustercomputing

distributed_computing

in_memory

bigdata

hdfs

cluster

machine_learning

python

graph_computing

mapreduce

ec2

共有 0 个贴子

没有任何数据可供显示

新建帖子