Algorithmic methods for search and storage

Summary

Finding specified patterns in large amounts of text data is at the core of how we have come to use search engines on the internet and search capabilities of document databases. The same goes for analysis of DNA strings and other biological data, which from a computer scientists perspective is essentially the same sort of data as text. On this course, we study basic algorithms necessary to provide search and pattern matching with enough power for the commonplace and scientific uses that we have grown used to. A related topic, which we also study, is algorithms for compressing data to represent information as space efficiently as possible. In addition to being directly useful, the material on the course serves as examples for a generally deeper knowledge of how to analyze and tackle a problem from the perspective of algorithm theory. This knowledge is widely valuable, outside of the application areas we address.

Admission requirements

7,5 credits within algorithms and data structures, alternative Math 3c/Math D and 3 credits within algorithms and data structures. 7,5 credits in programming.

Selection:

credits 20% final grades 60% national university aptitude test 20%

Syllabus

Syllabus for students autumn 2019

Course Code:
DA295A revision 1
Swedish name:
Algoritmiska metoder för sökning och lagring
Level of specialisation
G1F
Main fields of study:
Computer Science
Language:
English
Date of ratification:
15 February 2019
Decision-making body:
Faculty of Technology and Society
Enforcement date:
02 September 2019

Entry requirements

7,5 credits within algorithms and data structures, alternative Math 3c/Math D and 3 credits within algorithms and data structures. 7,5 credits in programming.

Purpose

The course aims to let the student advance his or her knowledge in algorithmic methodology, specifically applied to methods for efficient searching and storing of sequential data strings, for example, text in natural or formal languages, or genetic data and other biological data. The student learns to make use of, adapt, and implement effective methods for representing, structuring, searching, and compressing large amounts of data.

Contents

  • Representation of sequential data, such as text or DNA sequences
  • Methods for string search, and pattern matching using regular expressions
  • Radix sorting methods
  • Data structures for efficient string lookup, such as an inverted file, trie, suffix tree, and suffix array
  • Methods for lossless data compression such as Huffman code, Ziv–Lempel compression, and the Burrows–Wheeler transform

Learning outcomes

Knowledge and understanding:
On completion of the course the student shall:

  • Account for important algorithms and data structures for sequential data, their use, and their properties with regards to efficiency
  • Account for the operation of important lossless data compression methods
  • Motivate and explain efficiency properties of algorithmic methods to process, search, and compress sequential data
Skills and abilities
On completion of the course the student shall:
  • Choose, and when necessary adapt, algorithmic methods to process, search and compress sequential data.
  • Implement algorithms and data structures for efficient storage and search in large quantities of data
Judgement and approach
On completion of the course the student shall:
  • Faced with a problem specification in processing sequential data, analyze the problem and critically choose an appropriate algorithmic approach for its solution
  • Reason about, and critically evaluate, design and implementation choices from performance and storage space requirement aspects.

Learning activities

Lectures, teacher-supervised time for exercises with and without computer work, individual studies and group work.

Assessments

The course examination consists of a written exam (4 ECTS), and mandatory assignments (3.5 ECTS).
For a passing grade (A–E), all mandatory assignments must be approved, and the written exam graded at least E.
The final course grade is based on the written exam.

Grading system

Excellent (A), Very Good (B), Good (C), Satisfactory (D), Pass (E) or Fail (U).

Course literature and other teaching materials

  • Sedgewick R, Wayne K: Algorithms, 4th ed., Addison Wesley 2011.
  • Witten, I H, Moffat, A, Bell T C: Managing gigabytes: compressing and indexing documents and images, 2nd ed., Morgan Kaufman 1999.
In addition, a collection of scientific texts will be added.

Other Information

The course overlaps with about 50% with the course DA357A

Contact

The education is provided by the Faculty of Technology and Society at the department Computer Science and Media Technology.

Further information

Jesper Larsson, Course Coordinator
Phone: 040-6657303

Application

02 September 2019 - 10 November 2019 Day-time 50% Malmö Application code: mau-98168

National application round

Tuition fees

for non-EU students only

First instalment: 16000 SEK
Full tuition Fee: 16000 SEK

Open for late application

Apply