Disk-based algorithms for big data by Christopher G. Healey

By Christopher G. Healey

Disk-Based Algorithms for large facts is a fabricated from fresh advances within the components of massive facts, facts analytics, and the underlying dossier platforms and information administration algorithms used to aid the garage and research of huge information collections. The ebook discusses challenging disks and their influence on facts administration, due to the fact hard disk drive Drives stay universal in huge info clusters. It additionally explores how one can shop and retrieve facts notwithstanding basic and secondary indices. This features a assessment of alternative in-memory sorting and looking algorithms that construct a starting place for extra refined on-disk ways like mergesort, B-trees, and extendible hashing.

Following this advent, the e-book transitions to more moderen subject matters, together with complicated garage applied sciences like solid-state drives and holographic garage; peer-to-peer (P2P) verbal exchange; huge dossier platforms and question languages like Hadoop/HDFS, Hive, Cassandra, and Presto; and NoSQL databases like Neo4j for graph constructions and MongoDB for unstructured record data.

Designed for senior undergraduate and graduate scholars, in addition to pros, this e-book comes in handy for a person attracted to knowing the principles and advances in enormous info garage and administration, and large info analytics.

About the Author

Dr. Christopher G. Healey is a tenured Professor within the division of machine technology and the Goodnight extraordinary Professor of Analytics within the Institute for complicated Analytics, either at North Carolina country college in Raleigh, North Carolina. He has released over 50 articles in significant journals and meetings within the components of visualization, visible and knowledge analytics, special effects, and synthetic intelligence. he's a recipient of the nationwide technological know-how Foundation’s occupation Early college improvement Award and the North Carolina country collage striking teacher Award. he's a Senior Member of the organization for Computing equipment (ACM) and the Institute of electric and Electronics Engineers (IEEE), and an affiliate Editor of ACM Transaction on utilized conception, the top all over the world magazine at the software of human notion to concerns in laptop science.

Show description

Read or Download Disk-based algorithms for big data PDF

Best popular & elementary books

Solutions of Weekly Problem Papers

This Elibron Classics version is a facsimile reprint of a 1905 version by means of Macmillan and Co. , Ltd. , London.

A Course in Mathematical Methods for Physicists

Advent and ReviewWhat Do i must recognize From Calculus? What i want From My Intro Physics type? know-how and TablesAppendix: Dimensional AnalysisProblemsFree Fall and Harmonic OscillatorsFree FallFirst Order Differential EquationsThe easy Harmonic OscillatorSecond Order Linear Differential EquationsLRC CircuitsDamped OscillationsForced SystemsCauchy-Euler EquationsNumerical ideas of ODEsNumerical ApplicationsLinear SystemsProblemsLinear AlgebraFinite Dimensional Vector SpacesLinear TransformationsEigenvalue ProblemsMatrix formula of Planar SystemsApplicationsAppendix: Diagonali.

Additional resources for Disk-based algorithms for big data

Sample text

Multiple seeks will be needed to locate keys, even if we use a binary search. Reordering the index during addition or deletion will be prohibitively expensive. In these cases, we will most often switch to a different data structure to support indexing, for example, B-trees or external hash tables. There is another significant advantage to simple indices. Not only can we index on a primary key, we can also index on secondary keys. This means we can provide multiple pathways, each optimized for a specific kind of search, into a single data file.

We must walk over all of these holes when we’re searching for a 20 Disk-Based Algorithms for Big Data location for a new record. Finally, by their nature, the small holes created on addition will often never be big enough to hold a new record, and over time they can add up to a significant amount of wasted space. Worst Fit. Suppose we instead kept the availability list sorted in descending order of hole size, with the largest available hole always at the front of the list. A first fit strategy will now find the largest hole capable of storing a new record.

McIlroy. Proceedings of the 4th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 1993), Austin, TX, pp. 467–474, 1993. org/2009/10/sorting-algorithm-shootout. 1 A web browser search field EARCHING IS the second fundamental operation we will study in this course. As with sorting, efficient searching is a critical foundation in computer science. We review O n linear search and O lg n binary search, then discuss more sophisticated approaches. Two of these techniques, trees and hashing, form the basis for searching very large data collections that must remain on disk.

Download PDF sample

Rated 4.44 of 5 – based on 44 votes