닫기
Loading..

전자정보연구정보센터 ICT 융합 전문연구정보의 집대성

국내 논문지

홈 홈 > 연구문헌 > 국내 논문지 > 전기학회논문지 (The Transactions of The Korean Institute of Electrical Engineers)

전기학회논문지 (The Transactions of The Korean Institute of Electrical Engineers)

Current Result Document : 24 / 29 이전건 이전건   다음건 다음건

한글제목(Korean Title) 빅데이터 분석을 위해 아파치 스파크를 이용한 원시 데이터소스에서 데이터 추출
영문제목(English Title) Capturing Data from Untapped Sources using Apache Spark for Big Data Analytics
저자(Author) Aaron Nichie   구흥서   Heung-Seo Koo  
원문수록처(Citation) VOL 65 NO. 07 PP. 1277 ~ 1282 (2016. 07)
한글내용
(Korean Abstract)
영문내용
(English Abstract)
The term 'Big Data' has been defined to encapsulate a broad spectrum of data sources and data formats. It is often described to be unstructured data due to its properties of variety in data formats. Even though the traditional methods of structuring data in rows and columns have been reinvented into column families, key-value or completely replaced with JSON documents in document-based databases, the fact still remains that data have to be reshaped to conform to certain structure in order to persistently store the data on disc. ETL processes are key in restructuring data. However, ETL processes incur additional processing overhead and also require that data sources are maintained in predefined formats. Consequently, data in certain formats are completely ignored because designing ETL processes to cater for all possible data formats is almost impossible. Potentially, these unconsidered data sources can provide useful insights when incorporated into big data analytics. In this project, using big data solution, Apache Spark, we tapped into other sources of data stored in their raw formats such as various text files, compressed files etc and incorporated the data with persistently stored enterprise data in MongoDB for overall data analytics using MongoDB Aggregation Framework and MapReduce. This significantly differs from the traditional ETL systems in the sense that it is compactible regardless of the data formats at source.
키워드(Keyword) Apache Spark   MapReduce   Big Data   MongoDB   Analytics  
원문 PDF 다운로드

* 자료제공 NDSL