Search

Suggested keywords:
  • Java
  • Docker
  • Git
  • React
  • NextJs
  • Spring boot
  • Laravel

Tikka - A content analysis toolkit

  • Share this:
post-title
Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. It extracts text from following file formats.

  • HyperText Markup Language
  • XML and derived formats
  • Microsoft Office document formats
  • OpenDocument Format
  • Portable Document Format
  • Electronic Publication Format
  • Rich Text Format
  • Compression and packaging formats
  • Text formats
  • Audio formats
  • Image formats
  • Video formats
  • Java class files and archives
  • The mbox format
http://tika.apache.org/
License:
Tech: