Big data-security tools, machine learning, labelling,…

Security Tools

  • Apache Ranger is a framework to enable, monitor and manage comprehensive data security across the Hadoop platform.
  • Apache Sentry is a system for applying functionality-based authorization of fine granularity to data and metadata stored in a Hadoop cluster.
  • Knox is a Gateway application to interact with the REST API and the Apache Hadoop UI.
  • Kerberos is an authentication protocol that allows two computers to demonstrate their identity mutually in a secure way.

Machine learning Tools

  • Apache Mahout is a distributed framework of linear algebra and mathematically expressive Scala DSL, designed to quickly implement algorithms.
  • Spark MLlib is a library of machine learning, which contains the original API built on the RDD.
  • SparkML is a library of machine learning, which provides a top-level API built on DataFrames.
  • FlinkML is a library of machine learning for Flink.

Data labelling tools

  • Apache Falcon
  • Apache Atlas

Log processing Tools

  • LogStash: Open-source tool for the administration of logs, which allows to load, transform, filter and save the logs on which to search.
  • Apache Chukwa 
  • FLUENTD: Proprietary tool for managing logs.

Serialization Tools

  • Protobuf
  • Avro

Other tools

  • Fuse
  • NFS
  • WebHDFS