Database II

of 119

Please download to get full document.

View again

All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
PDF
119 pages
0 downs
4 views
Share
Description
Database II. Goals and Objectives. The course goals are to introduce students to the Advance Topics of Databases Systems. DBMS are at the heart of modern commercial application development. use extends beyond this to many applications
Transcript
Database IIGoals and Objectives
  • The course goals are to introduce students to the Advance Topics of Databases Systems.
  • DBMS are at the heart of modern commercial application development.
  • use extends beyond this to many applications
  • Large amounts of data Storage with efficient update and retrieval. 
  • The aim of this course is to study the internals of DBMSs. Students are introduced to the major problems to be faced when building a DBMS and also the ideas behind and algorithms to implement solutions of these problems.
  • Text Book & Reference Books
  • Text Book
  • Fundamentals of Database Systems by R. Elmasri and S. Navathe, 5th Edition 2007, Benjamin/Cummings
  • Database Systems Concepts By Abraham Silberschatz (Fifth Edition)
  • Reference Books
  • Modern Database management , Eight Edition, By Jefrey A. Hoffer & Mary B. Prescott.
  • Background: Prerequisite lecturesRelational Algebra Sects: 6.1 –– 6.3 – 6.4 – 6.5Basic Algebra Operations (projection, selection, and join)Types of join (Θ-join, Equi-join, and Natural join)Set Operations (Union, Intersection, and Difference)Division OperationSQL Sects: 8.4 – 8.5 – 8.8Basic SELECT-FROM-WHERE query blockNested queriesViewsIndexingSects: 14.1 – 14.2 – 14.3Contents
  • Chapter 15 - Algorithms for Query Processing and Optimization 3-4 Weeks
  • Translating SQL Queries into Relational Algebra
  • Algorithms for External Sorting
  • Algorithms for SELECT and JOIN Operations
  • Algorithms for PROJECT and SET Operations
  • Implementing Aggregate Operations and OUTER JOINS
  • Combining Operations Using Pipelining
  • Using Heuristics in Query Optimization
  • Using Selectivity and Cost Estimates in Query Optimization
  • Overview of Query Optimization in Oracle
  • Semantic Query Optimization
  • ContentsCont…
  • Transaction Processing 5-6 Weeks
  • Chapter 17 - Introduction to Transaction Processing Concepts and Theory
  • Introduction to Transaction Processing
  • Transaction and System Concepts
  • Desirable Properties of Transactions
  • Characterizing Schedules Based on Recoverability
  • Characterizing Schedules Based on Serializability
  • Transaction Support in SQL
  • ContentsCont…
  • Chapter 18 - Concurrency Control Techniques
  • Two-Phase Locking Techniques for Concurrency Control
  • Concurrency Control Based on Timestamp Ordering
  • Multiversion Concurrency Control Techniques
  • Validation (Optimistic) Concurrency Control Techniques
  • Granularity of Data Items and Multiple Granularity Locking
  • Using Locks for Concurrency Control in Indexes
  • Other Concurrency Control Issues
  • ContentsCont…
  • Chapter 19 - Database Recovery Techniques
  • Recovery Concepts
  • Recovery Techniques Based on Deferred Update
  • Recovery Techniques Based on Immediate Update
  • Shadow Paging
  • The ARIES Recovery Algorithm
  • Recovery in Multidatabase Systems
  • Database Backup and Recovery from Catastrophic Failures
  • Security 2 Weeks
  • Chapter 23: Database Security
  • QUERY PROCESSING and OPTIMIZATIONAgenda of QUERY PROCESSING and OPTIMIZATION
  • Query Processing and Optimization: Why?
  • Steps of Processing
  • 0. Introduction to Query Processing1. Translating SQL Queries into Relational Algebra 2. Algorithms for External Sorting3. Algorithms for SELECT and JOIN Operations4. Algorithms for PROJECT and SET Operations5. Implementing Aggregate Operations and Outer Joins6. Combining Operations using PipeliningAgenda of QUERY PROCESSING and OPTIMIZATIONIII. Methods of Optimization7. Using Heuristics in Query Optimization
  • Heuristic (Logical Transformations)
  • Transformation Rules
  • Heuristic Optimization Guidelines
  • 8. Using Selectivity and Cost Estimates in Query Optimization
  • Cost Based (Physical Execution Costs)
  • Data Storage/Access Refresher
  • Catalog & Costs
  • 9. Overview of Query Optimization in Oracle10. Semantic Query OptimizationIV. What All This Means To YOU?Query Processing & OptimizationWhat is Query Processing?
  • Steps required to transform high level SQL query into a correct and “efficient” strategy for execution and retrieval.
  • What is Query Optimization?
  • The activity of choosing a single “efficient” execution strategy (from hundreds) as determined by database catalog statistics.
  • Questions for Query Optimization
  • Which relational algebra expression, equivalent to the given query, will lead to the most efficient solution plan?
  • For each algebraic operator, what algorithm (of several available) do we use to compute that operator?
  • How do operations pass data (main memory buffer, disk buffer,…)?
  • Will this plan minimize resource usage? (CPU/Response Time/Disk)
  • Query Processing: Who needs it?A motivating example:
  • Identify all managers who work in a London branch
  • SELECT *
  • FROM Staff s, Branch b
  • WHERE s.branchNo = b.branchNo AND s.position = ‘Manager’ AND b.city = ‘london’;
  • Results in these equivalent relational algebra statements (1) s(position=‘Manager’)^(city=‘London’)^(Staff.branchNo=Branch.branchNo) (Staff X Branch) (2) s(position=‘Manager’)^(city=‘London’) (StaffwvStaff.branchNo = Branch.branchNoBranch) (3) [s(position=‘Manager’) (Staff)] wvStaff.branchNo = Branch.branchNo [s(city=‘London’) (Branch)]A Motivating Example (cont…)Assume:
  • 1000 tuples in Staff.
  • ~ 50 Managers
  • 50 tuples in Branch.
  • ~ 5 London branches
  • No indexes or sort keys
  • All temporary results are written back to disk (memory is small)
  • Tuples are accessed one at a time (not in blocks)
  • Motivating Example: Query 1 (Bad)s(position=‘Manager’)^(city=‘London’)^(Staff.branchNo=Branch.branchNo)(Staff X Branch)
  • Requires (1000+50) disk accesses to read from Staff and Branch relations
  • Creates temporary relation of Cartesian Product (1000*50) tuples
  • Requires (1000*50) disk access to read in temporary relation and test predicate
  • Total Work = (1000+50) + 2*(1000*50) = 101,050 I/O operationsMotivating Example: Query 2 (Better)s(position=‘Manager’)^(city=‘London’)(Staff wvStaff.branchNo = Branch.branchNoBranch)
  • Again requires (1000+50) disk accesses to read from Staff and Branch
  • Joins Staffand Branch on branchNo with 1000 tuples
  • (1 employee : 1 branch )
  • Requires (1000) disk access to read in joined relation and check predicate
  • Total Work = (1000+50) + 2*(1000) = 3050 I/O operations3300% Improvement over Query 1Motivating Example: Query 3 (Best)[ s(position=‘Manager’) (Staff) ]wvStaff.branchNo = Branch.branchNo [ s(city=‘London’) (Branch) ]
  • Read Staff relation to determine ‘Managers’ (1000 reads)
  • Create 50 tuple relation(50 writes)
  • Read Branch relation to determine ‘London’ branches (50 reads)
  • Create 5 tuple relation(5 writes)
  • Join reduced relations and check predicate (50 + 5 reads)
  • Total Work = 1000 + 2*(50) + 5 + (50 + 5) = 1160 I/O operations8700% Improvement over Query 1Consider if Staff and Branch relations were 10x size? 100x? Yikes!0. Introduction to Query Processing (1)
  • Query optimization:
  • The process of choosing a suitable execution strategy for processing a query.
  • Two internal representations of a query:
  • Query Tree
  • Query Graph
  • Introduction to Query Processing (2)Check Attributes and RelationsIdentification of the query languageCreation of possible Relational algebraCheck SyntaxesChoose the strategy for executing the query and produce the optimal executing plan1. Translating SQL Queries into Relational Algebra (1)
  • Query block:
  • The basic unit that can be translated into the algebraic operators and optimized.
  • A query block contains a single SELECT-FROM-WHERE expression, as well as GROUP BY and HAVING clause if these are part of the block.
  • Nested queries within a query are identified as separate query blocks.
  • Aggregate operators in SQL must be included in the extended algebra.
  • Translating SQL Queries into Relational Algebra (2)SELECT LNAME, FNAMEFROM EMPLOYEEWHERE SALARY > ( SELECT MAX (SALARY)FROM EMPLOYEEWHERE DNO = 5);SELECT LNAME, FNAMEFROM EMPLOYEEWHERE SALARY > CSELECT MAX (SALARY)FROM EMPLOYEEWHERE DNO = 5πLNAME, FNAME(σSALARY>C(EMPLOYEE))ℱMAX SALARY(σDNO=5 (EMPLOYEE))2. Algorithms for External Sorting (1)
  • External sorting:
  • Refers to sorting algorithms that are suitable for large files of records stored on disk that do not fit entirely in main memory, such as most database files.
  • Sort-Merge strategy:
  • Starts by sorting small subfiles (runs) of the main file and then merges the sorted runs, creating larger sorted subfiles that are merged in turn.
  • Sorting phase: nR = (b/nB)
  • Merging phase: dM = Min (nB-1, nR); nP = (logdM(nR))
  • nR: number of initial runs; b: number of file blocks;
  • nB: available buffer space; dM: degree of merging;
  • nP: number of passes.
  • Sort/Merge Sort Sort Merge Partition Sort SortMergingSingle phaseMergingMultiple phaseExternal SortingSort-merge strategy
  • The Sorting phase
  • Runs of file are read into main memory
  • Runs are sorted using an internal sorting algorithm
  • Runs are written back to disk as temporary sorted subfiles
  • nR : number of initial runs b : number of file blocksnB: available buffer spacenR: Example: nB: 5 blocks, b: 1024 blocksnB== 205 runsExternal SortingSort-merge strategy (Cont.)
  • Merging phase Sorted runs are merged during one or more passses.
  • dM: degree of merging (dM–way merging)number of runs that can be merged together in each passdM= min ( ( nB-1), nR) number of passes = ┌ ┐2051111…152144…..11321316…..4364…..614205External SortingSort-merge strategy (Cont.)Example: (2*b + 2* ) dM = 4 ( 4-way merging)2. Algorithms for External Sorting (1)
  • The typical external sorting algorithm uses a sort-merge strategy, which starts by
  • sorting small subfiles—called runs—of the main file and then merges the sorted runs, creating larger sorted subfiles that are merged in turn. The sort-merge algorithm, like other database algorithms, requires buffer space in main memory, where the actual sorting and merging of the runs is performed. two phases: the sorting phase and the merging phase.2. Algorithms for External Sorting (1)
  • In the sorting phase, runs (portions or pieces) of the file that can fit in the available
  • buffer space are read into main memory, sorted using an internal sorting algorithm, and written back to disk as temporary sorted subfiles (or runs).
  • The size of a run and number of initial runs (nR) is
  • dictated by the number of file blocks (b) and the available buffer space (nB). For example, if nB = 5 blocks and the size of the file b = 1024 blocks, then nR= ⎡(b/nB)⎤ or 205 initial runs each of size 5 blocks (except the last run which will have 4 blocks).
  • Hence, after the sort phase, 205 sorted runs are stored as temporary subfiles on disk.
  • 2. Algorithms for External Sorting (1)
  • In the merging phase, the sorted runs are merged during one or more passes. The degree of merging (dM) is the number of runs that can be merged in each pass.
  • In each pass, one buffer block is needed to hold one block from each of the runs being merged, and one block is needed for containing one block of the merge result.
  • Hence, dM is the smaller of (nB − 1) and nR, and the number of passes is ⎡(logdM(nR))⎤. In our example, dM = 4 (four-way merging), so the 205 initial sorted runs would be merged into 52 at the end of the first pass, which are then merged into 13, then 4, then 1 run, which means that four passes are needed.
  • 2. Algorithms for External Sorting (1)
  • The minimum dM of 2 gives the worst-case performance of the algorithm, which is (2 ∗ b) + (2 ∗ (b ∗ (log2 b))).
  • The first term represents the number of block accesses for the sort phase, since each file block is accessed twice—once for reading into memory and once for writing the records back to disk after sorting.
  • The second term represents the number of block accesses for the merge phase, assuming the worst-case dM of 2. In general, the log is taken to the base dM and the expression for number of block accesses becomes (2 ∗ b) + (2 ∗ (b ∗ (logdM nR))).
  • Userselect *from R, Swhere …Resolve the references,Syntax errors etc.Converts the query to an internal formatrelational algebra likeQuery ParserQuery OptimizerResultsFind the best way to evaluate the query Which index to use ? What join method to use ? … Query ProcessorRead the data from the filesDo the query processingjoins, selections, aggregates …R, B+Tree on R.aS, Hash Index on S.a…Overview“Cost”
  • Complicated to compute
  • We will focus on disk:
  • Number of I/Os ?
  • Not sufficient
  • Number of seeks matters a lot… why ?
  • tT – time to transfer one block
  • tS – time for one seek
  • Cost for b block transfers plus S seeksb * tT + S * tS
  • Measured in seconds
  • 3. Algorithms for SELECT and JOIN Operations (1)
  • Implementing the SELECT Operation
  • Examples:
  • (OP1): sSSN='123456789' (EMPLOYEE)
  • (OP2): sDNUMBER>5(DEPARTMENT)
  • (OP3): sDNO=5(EMPLOYEE)
  • (OP4): sDNO=5 AND SALARY>30000 AND SEX=F(EMPLOYEE)
  • (OP5): sESSN=123456789 AND PNO=10(WORKS_ON)
  • Basic Algorithms for Executing Query OperationsImplementingSELECT(OP1) σSSN=12345689 (EMPLOYEE)equality comparison on key attribute(OP2) σDNUMBER > 5 (DEPARMENT)nonequality comparison on key attribute(OP3) σDNO=5 (EMPLOYEE) equality comparison on non key attribute(OP4) σDNO=5 AND SALARY >30000 AND SEX=F(EMPLOYEE)conjunctive condition(OP5) σESSN=123456789 AND DNO=10 (WORKS_ON) conjunctive condition and composite keySearch Methods for Selectionfile scans / index scansS1. Linear Search (brute force)S2. Binary Search SSN=123456789 (OP1) ordering attribute for EMPLOYEES3. Use primary index or hash key (single record) SSN=123456789 (OP1) Primary index or hash keyS4. Use primary index (multiple records) DNUMBER>5 (OP2)primary indexS5. Use clustering index (multiple records) DNO=5 (OP3)clustering index
  • Locate
  • Find proceeding subsequent
  • Algorithms for SELECT and JOIN Operations (2)
  • Implementing the SELECT Operation (contd.):
  • Search Methods for Simple Selection:
  • S1 Linear search (brute force):
  • Retrieve every record in the file, and test whether its attribute values satisfy the selection condition.
  • S2 Binary search:
  • If the selection condition involves an equality comparison on a key attribute on which the file is ordered, binary search (which is more efficient than linear search) can be used. (See OP1).
  • S3 Using a primary index or hash key to retrieve a single record:
  • If the selection condition involves an equality comparison on a key attribute with a primary index (or a hash key), use the primary index (or the hash key) to retrieve the record.
  • Selection Operation
  • select * from person where SSN = “123”
  • Option 1: Sequential Scan
  • Read the relation start to end and look for “123”
  • Can always be used (not true for the other options)
  • Cost ?
  • Let br = Number of relation blocks
  • Then:
  • 1 seek and br block transfers
  • So:
  • tS + br * tT sec
  • Improvements:
  • If SSN is a key, then can stop when found
  • So on average, br/2 blocks accessed
  • Selection Operation
  • select * from person where SSN = “123”
  • Option 2 : Binary Search:
  • Pre-condition:
  • The relation is sorted on SSN
  • Selection condition is an equality
  • E.g. can’t apply to “Name like ‘%424%’”
  • Do binary search
  • Cost of finding the first tuple that matches
  • log2(br) * (tT + tS)
  • All I/Os are random, so need a seek for all
  • The last few are closeby, but we ignore such small effects
  • Not quite: What if 10000 tuples match the condition ?
  • Incurs additional cost
  • Primary index on the ordering key fieldAlgorithms for SELECT and JOIN Operations (3)
  • Implementing the SELECT Operation (contd.):
  • Search Methods for Simple Selection:
  • S4 Using a primary index to retrieve multiple records:
  • If the comparison condition is >, ≥, <, or ≤ on a key field with a primary index, use the index to find the record satisfying the corresponding equality condition, then retrieve all subsequent records in the (ordered) file.
  • S5 Using a clustering index to retrieve multiple records:
  • If the selection condition involves an equality comparison on a non-key attribute with a clustering index, use the clustering index to retrieve all the records satisfying the selection condition.
  • S6 Using a secondary index:
  • On an equality comparison, this search method can be used to retrieve a single record if the indexing field has unique values (is a key) or to retrieve multiple records if the indexing field is not a key.
  • In addition, it can be used to retrieve records on conditions involving >,>=, <, or <=. (FOR RANGE QUERIES)
  • A Clustering Index Example
  • FIGURE 14.2A clustering index on the DEPTNUMBER ordering non-key field of an EMPLOYEE file.
  • A Two-level Primary IndexExample of a Dense Secondary IndexAn Example of a Secondary Index3. Algorithms for SELECT and JOIN Operations (1)
  • Implementing the SELECT Operation
  • Examples:
  • (OP1): sSSN='123456789' (EMPLOYEE)
  • (OP2): sDNUMBER>5(DEPARTMENT)
  • (OP3): sDNO=5(EMPLOYEE)
  • (OP4): sDNO=5 AND SALARY>30000 AND SEX=F(EMPLOYEE)
  • (OP5): sESSN=123456789 AND PNO=10(WORKS_ON)
  • Algorithms for SELECT and JOIN Operations (4)
  • Implementing the SELECT Operation (contd.):
  • Search Methods for Simple Selection:
  • S7 Conjunctive selection:
  • If an attribute involved in any single simple condition in the conjunctive condition has an access path that permits the use of one of the methods S2 to S6, use that condition to retrieve the records and then check whether each retrieved record satisfies the remaining simple conditions in the conjunctive condition.
  • S8 Conjunctive selection using a composite index
  • If two or more attributes are involved in equality conditions in the conjunctive condition and a composite index (or hash structure) exists on the combined field, we can use the index directly.
  • COND1 AND COND2 AND …AND CONDNMore than one of attributes involved in conditions that have access pathChoose the access path
  • Retrieve the fewest records
  • In the most efficient wayselectivity=estimates of selectivities =(1) key attribute (2) nonkey attribute where i : # of listing values for attribute in r(n)
  • Algorithms for SELECT and JOIN Operations (5)
  • Implementing the SELECT Operation (contd.):
  • Search Methods for Complex Selection:
  • S9 Conjunctive selection by intersection of record pointers:
  • This method is possible if secondary indexes are available on all (or some of) the fields involved in equality comparison conditions in the conjunctive condition and if the indexes include record pointers (rather than block pointers).
  • Each index can be used to retrieve the record pointers that satisfy the individual condition.
  • The intersection of these sets of record pointers gives the record pointers that satisfy the conjunctive condition, which are then used to retrieve those records directly.
  • If only some of the conditions have secondary indexes, each retrieved record is further tested to determine whether it satisfies the remaining conditions.
  • Selection Operation
  • Complex selections
  • Conjunctive: select * fro
  • Related Search
    We Need Your Support
    Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

    Thanks to everyone for your continued support.

    No, Thanks