Open/Public Dataset bermanfaat untuk beberapa hal:
    1. Saat mengerjakan project (e.g. aplikasi online/GIS) yang membutuhkan data sebagai input.
    2. Saat mengajar mata kuliah di Statistik, Data Mining, atau Machine learning, sebagai ilustrasi bagaimana suatu model bekerja dan properties-nya.
    3. Saat memberi tugas ke mahasiswa di mata kuliah-mata kuliah tersebut diatas.
    4. Saat penelitian yang mengusulkan (propose) model baru. Kita membutuhkan open/public dataset sebagai benchmark model yang kita ajukan dengan model (state of the art) yang ada sebelumnya.
    5. Saat melakukan review/survey beberapa metode di Statistik/data mining.
    6. Bagi mahasiswa/pelajar/umum untuk latihan menggunakan software atau belajar suatu metode tertentu.



Berikut ini adalah kompilasi dataset repositories untuk tujuan-tujuan diatas atau tujuan lain. Jika mengetahui sumber data lain yang mungkin bermanfaat bagi orang lain, mohon untuk menginformasikan lewat kolom komentar di bawah agar saya bisa update list-nya (terima kasih sebelumnya).
Basis Data Indonesia:
  1. Dataset Indonesia [data.go.id]
  2. Badan Pusat Statistik (BPS): (Hanya summary data).
  3. UN unglobalpulse research Data on Indonesia
  4. Global Open data Index Indonesia
  5. World bank Data on Indonesia
  6. OECD Data Indonesia
  7. Data Wilayah Indonesia dari Kemendagri
  8. Harga Pangan
  9. Harga Komoditi
Basis Data (Datasets) Umum:
  1. Google Public Data Explorer
  2. Microsoft Research Open Datasets
  3. UC Irvine Machine Learning Repository
  4. National Flight Data Center (NFDC)
  5. FAA Data & Research
  6. Flight Delay Information
  7. FAA Aviation Safety Information Analysis and Sharing (ASIAS)
  8. Aircraft Situation Display to Industry (ASDI)
  9. NTSB Accident Database & Synopses
  10. OpenFlights.org
  11. The Center for Innovation in Engineering and Science Education Real time data sites
  12. MIT Airline Data Project
  13. Space – Real-Time Space Weather Data Sources
  14. Politics – Data on the U.S. Congress – A Joint Effort from Brookings and the American Enterprise Institute
  15. Sports – Open Sports Data/API
  16. Sports – Football (Soccer) Stats
  17. Government  – Public Government Data Sets
  18. U.S. Department of Homeland Security Data
  19. Public Data for the State of Utah
  20. Finding Data on the Internet – Inside-R
  21. Nathan Yau’s collection of data sets
  22. Dr. Jerry A. Smith’s Favorite Data sets
  23. Hilary Mason’s “Research Quality” Data-sets
  24. Peter Skomoroch’s list of data sets on Delicious
  25. Data Wrangling blog data set list
  26. DonorsChoose.org – Hacking Education: A Contest for Developers and Data Crunchers
  27. Datasets for “The Elements of Statistical Learning”
  28. Enron Email Dataset
  29. Yandex
  30. The Data Page
  31. Public Data Sets on Amazon
  32. Miami School of Business Statistical Data Sets
  33. Public data put to good use
  34. ASU GeoDA Center Data
  35. European Cities 1M Data Sets
  36. University of Edinburgh School of Informatics Data Sets for Data Mining
  37. Opinion Mining, Sentiment Analysis, and Opinion Spam Detection
  38. Quandl – Intelligenct search for numerical data
  39. Gephi Graph Visualization Sample Data Sets
  40. CitiBike, by NYC Bike Share – Station data
  41. Large Datasets 
  42. Air Quality Notifications
  43. The GDELT Project – Global Database of Events, Language, and Tone
  44. http://www.kdnuggets.com/datasets/index.html
  45. http://goo.gl/9eNqFq  [more from KDNugets]
  46. http://archive.ics.uci.edu/ml/
  47. http://www.stat.ucla.edu/data/
  48. http://lib.stat.cmu.edu/
  49. http://www.umass.edu/statdata/statdata/
  50. http://datamarket.com/data/list/?q=provider:tsdl
  51. http://lib.stat.cmu.edu/DASL/
  52. http://www.statsci.org/data/index.html
  53. http://trec.nist.gov/data.html
  54. http://graphlab.org/resources/datasets.html
  55. http://www.scaleunlimited.com/datasets/public-datasets/
  56. http://www.datawrangling.com/some-datasets-available-on-the
  57. web
  58. http://www.inf.ed.ac.uk/teaching/courses/dme/html/datasets0405.html
  59. http://pami.uwaterloo.ca/~hammouda/webdata/
  60. http://www.daviddlewis.com/resources/testcollections/reuters21578/
  61. http://dumps.wikimedia.org/
  62. http://www.cs.cmu.edu/~WebKB/
  63. http://www.uco.es/~in1rosaj/utiles/datasets.html
  64. http://www.ke.tu-darmstadt.de/resources/eurlex/eurlex.html
  65. KEEL
Semoga bermanfaat,
Cheers,
</TES>®



Data Science, IoT, & Big Data

Taufik Sutanto has 35 posts and counting. See all posts by Taufik Sutanto