Introduction to Data Analytics

SUBJECT OUTLINE31250 to Data AnalyticsAttendance: 2hpw (workshop/laboratory)Recommended studies: knowledge of database technologiesSubject coordinatorProfessor Paul KennedyEmail: Subject Coordinator may be contacted by email if you have matters of a personal nature to discuss, e.g., illness,study problems, and for issues to do with extensions, group problems or other matters of importance.All email sent to subject coordinators, tutors or lecturers must have a clear subject line that states the subject numberfollowed by the subject of the email [e.g. Subject 31250, Request for Extension], and must be sent from your UTSemail address. In particular, avoid directly replying announcement messages to discuss unrelated issues.Consultation hours: After the class. Requests for appointments outside the given consultation hours may be arrangedwhere circumstances require, and to do so please contact the subject coordinator by email.Teaching staffContact details of tutorial staff and guest lecturers will be provided in class.Most classes will be given by:Prof Paul KennedyRoom: CB11.7.111Email: descriptionData is the art and science of turning large quantities of usually incomprehensive data into meaningful andcommercially valuable information. It is the basis of modern computer and intelligence. It includes a numberof IT areas, such as statistical methods for identifying patterns in data and making inferences; database technologiesfor managing the data sets to be mined; a range of intelligent technologies that derive automatically patterns from data;and visualisation and other multimedia techniques that support human pattern discovery capabilities. This subjectoffers the foundations of data , data mining and knowledge discovery methods and their application topractical problems. It brings together the state-of-the-art research and practical techniques in data , providingstudents with the necessary knowledge and capacity to initiate and conduct data mining research and developmentprojects, and professionally communicate with experts.Subject learning objectives (SLOs)Upon successful completion of this subject students should be able to:
Explain the background of data analytics including the business and society context;
Use data analytics to explore and gain a broad understanding of a dataset;
Outline the scope and limitations of several state-of-the-art data analytics methods;
Use data analytics methods to make predictions for a dataset;Course area UTS: Information TechnologyDelivery Spring 2019; CitySubjectclassification Information Technology: SoftwareCredit points 6cpResult type Grade and marks27/06/2019 (Spring 2019) © University of Technology Sydney Page 1 of 8
Organise and implement a data analytics project in a business environment;
Communicate the results of a data analytics project.Course intended learning outcomes (CILOs)This subject also contributes specifically to the development of the following Course Intended Learning Outcomes(CILOs):Identify, interpret and analyse stakeholder needs (A.1)Apply systems thinking to understand complex system behaviour, including interactions between components andwith other systems (social, cultural, legislative, environmental, business, etc.) (A.5)Identify and apply relevant problem-solving methodologies (B.1)Design components, systems and/or processes to meet required specifications (B.2)Synthesise alternative/innovative solutions, concepts and procedures (B.3)Communicate effectively in ways appropriate to the discipline, audience and purpose (E.1)Teaching and learning strategiesSubject presentation includes combined workshop and laboratory sessions (2 hours) and research and developmentwork for the assignments. Students will need to undertake preparation using material on UTSOnline to make effectiveuse of their class time. Online lectures will present the theoretical aspects of data mining. Guest lectures about casestudies of real-world business applications of data mining techniques will be face-to-face. The laboratory sessionsfocus on hands-on experience in data mining and data analytics tools, and the understanding and interpretation of theresults. Practical assignments can be performed anywhere. The labs will provide the tools necessary to completethese assignments. Prepreparation will help students to participate in the in-class individual and group exercises.Regular zero mark quizzes throughout the semester will allow students to gauge their progress.Content (topics)The subject will cover topics from the following:Introduction to data mining: problems; data mining concepts, types of data that we collect, the data mining andknowledge discovery process (CRISP DM methodology, SAS SEMMA Methodology), differences between datamining and knowledge discovery, what can be discovered; the concepts of ‘interestingness’, usefulness’ and‘novelty’ of discovered patterns; overview of application areas, the data mining professional.a.Visual data exploration and mining: data visualisation techniques and their applicability in data mining, visual datamining methods.b.Data pre-processing and transformation: problems; small and large data sets; missing data and dealing with it;noisy data and sampling; missing data; techniques for data cleaning; techniques for removing sensitive information,legal issues.c.Classification and Prediction: problems for classification and prediction; classification by decision tree induction;classification by support vector machine; ensemble methods and random forest; classification accuracy; issues inprediction; applications in medical diagnosis, credit approval, target marketing, medical diagnosis, DNA microarrayanalysis.d.Clustering: problems for cluster analysis; types of data; partitioning methods, hierarchical methods; density-basedmethods; k-means and related methods.e.f. Deployment of results: representing patterns as rules, functions, cases; model deployment; industry applications.ProgramWeek/Session Dates Description1 22 Jul Introduction to the subject.Preparation Week. Learning to use the software.2 29 Jul Introduction to Data Analytics27/06/2019 (Spring 2019) © University of Technology Sydney Page 2 of 83 5 Aug Data4 12 Aug Data preprocessing5 19 Aug Visual data exploration6 26 Aug Clustering7 2 Sep Classification and Prediction: Decision Trees9 Sep StuVac. No class.8 16 Sep Evaluating Predictive Models9 23 Sep Ensemble methods and random forest10 30 Sep Linear methods and the Support Vector Machine11 7 Oct Neural networks12 14 Oct Guest Lecture and Prize DistributaionStuVac 21 Oct StuVac. No class.AssessmentDetails about assignments and submission procedures are provided on the subject website.Assignments are to be submitted to UTS Online.Zero mark quizzes will help students to gauge their progress in the subject. Continuous monitoring and feedback isgiven to students during in-class activities that they can use to help with their assignments.Assessment task 1: Dream JobsObjective(s): This assessment task addresses the following subject learning objectives (SLOs):1This assessment task contributes to the development of the following course intended learningoutcomes (CILOs):A.1, A.5 and E.1Type: ReportGroupwork: Individual27/06/2019 (Spring 2019) © University of Technology Sydney Page 3 of 8Weight: 15%Task: Individual assessment.This assignment is an individual project where students look at several job advertisements in datascience / data analytics. From these they distil the skills and attributes required for the jobs. Theywrite a short report outlining how their current experience and skills measure up and outline a set ofexperiences and a plan to gain the required skills and attributes.Length: The task requires submission of a report of 5 pages in an 11 or 12 point font.Due: 11.59pm Friday 9 August 2019Criterialinkages:Criteria Weight (%) SLOs CILOsIdentifying appropriate jobs 25 1 A.1Identifying skills and attributes 25 1 A.1, A.5Current status with evidence 25 1 E.1Credible plan 25 1 E.1SLOs: subject learning objectivesCILOs: course intended learning outcomesFurtherinformation:Feedback processes: Peer feedback will be given in class. Marked assignments will be returnedwithin 2-3 weeks of submission through returned work.Weighting of Assessment Criteria is approximate. Please refer to the marking guide for specificweighting allocation.Assessment task 2: Data exploration and preparationObjective(s): This assessment task addresses the following subject learning objectives (SLOs):2 and 3This assessment task contributes to the development of the following course intended learningoutcomes (CILOs):B.1, B.2, B.3 and E.1Type: ReportGroupwork: IndividualWeight: 35%Task: Individual assessment.This assignment includes practical work on data visualisation, exploration and preparation(preprocessing and transformation) for data analytics.Length: A report of about 20 pages in an 11 or 12 point font.Due: 11.59pm Friday 6 September 2019Criterialinkages:Criteria Weight (%) SLOs CILOs27/06/2019 (Spring 2019) © University of Technology Sydney Page 4 of 8linkages:Correctness and understanding ofpre-processing and transformation steps33 2, 3 B.1, B.2, B.3Depth of understanding of the dataexploration33 2 B.1, B.2, B.3, E.1Quality of the communication of results. 34 2 E.1SLOs: subject learning objectivesCILOs: course intended learning outcomesFurtherinformation:Feedback processes: marks with feedback within 2-3 weeks of submission through returned work.Weighting of Assessment Criteria is approximate. Please refer to the marking guide for specificweighting allocation.Assessment task 3: Data mining in actionObjective(s): This assessment task addresses the following subject learning objectives (SLOs):3, 4, 5 and 6This assessment task contributes to the development of the following course intended learningoutcomes (CILOs):A.1, B.1, B.2, B.3 and E.1Type: ReportGroupwork: IndividualWeight: 50%Task: Students will be allocated a type of classifier to use for a predictive analytics task. They must use atleast this classifier, but more probably several other methods of their own choosing to solve theproblem. The best classification model will be submitted to the Kaggle web site and the students withthe best results by the due date will win a prize.Students must submit a 10-page report discussing how they solved the problem and give results. Thiswill contribute to 30 of the 50 marks.Each student will also undertake a short oral defence of their work. This will contribute to 20 of theremaining 50 marks and defences will be run throughout the session. At the oral defence, studentswill answer questions about their classifier(s) showing the workflow or code. Students who fail willreceive 0 out of 20 marks. Students showing their allocated classifier will receive 10 marks. Studentsshowing several classifiers with some preprocessing and parameter setting will receive 15 marks.Students showing an insightful and thorough investigation will receive 20 marks. Students who fail areallowed to undertake the oral once again and if they pass will receive a maximum of 10 marks.Length: 10-page report. Oral defence of around 5 minutes.Due: 11.59pm Friday 11 October 2019Criterialinkages:Criteria Weight (%) SLOs CILOsQuality of report detailing howthe problem was solved60 3, 4, 5, 6 A.1, B.1, B.2, B.3, E.1Oral defence 40 3, 4, 5, 6 A.1, B.1, B.2, B.3, E.127/06/2019 (Spring 2019) © University of Technology Sydney Page 5 of 8SLOs: subject learning objectivesCILOs: course intended learning outcomesFurtherinformation:Feedback processes: For the report, marks with feedback within 2-3 weeks of submission throughreturned work.Weighting of Assessment Criteria is approximate. Please refer to the marking guide for specificweighting allocation.Minimum requirementsIn order to pass the subject, a student must achieve an overall mark of 50% or more.Recommended texts
Pang-Ning Tan, Michael Steinbach and Vipin Kumar (2006). Introduction to Data Mining, Addison-Wesley.Graham Williams (2011). Data Mining with Rattle and R, Springer. This is a nice simple introduction to data miningusing the R statistical language and Rattle, a package that sits on top of it.2.Margaret H. Dunham (2002). Data Mining: Introductory and Advanced Topics, Prentice Hall. The book offers theundergraduate Computing and IT student an introduction to the full spectrum of data mining concepts andalgorithms in a comprehensive and consistent manner. The depth of coverage of each topic or method is exactlyright and appropriate. Each algorithm is presented in pseudocode sufficient for any interested student to convert itinto a working implementation.3.Han, J. and Kamber, M. (2001). Data Mining: Concepts and Techniques, Morgan Kaufmann. The book comes froman experienced database professional and also provides an introduction to the data mining concepts andtechniques, but from a database perspective. The book provides details about data warehousing and OLAPtechniques, examines algorithms, data structures, data types, and complexity of algorithms.4.Pyle, D. (1999). Data preparation for data mining, San Francisco, Calif.: Morgan Kaufmann Publishers. A key bookon data pre-processing5.Hand, D. J., Mannila, H. and Smyth, P. (2001). Principles of Data Mining, Bradford Books, MIT Press. This textprovides more engineering approach to the subject.6.Witten, I. H. and Frank, E. Data Mining: Practical Machine Learning Tools and Techniques with JavaImplementations, Morgan Kaufmann, CA, 2000. The book is a light broad view of data mining. The bookcomplements the WEKA toolkit used in the class.7.Westphal, C. and Blaxton, T. (1998). Data Mining Solutions: Methods and Tools for solving real world problems.John Wiley and Sons, NY. An excellent light text with lots of tools discussed (but at the level of 1997-98developments).8.Michael Friendly, Visualizing Categorical Data, SAS Press, 2001. The issues in this book are related to thevisualisation, visual data mining and representation of the results of data mining9.ReferencesKrzysztof J. Cios (ed.) (2000), IEEE Engineering in Medicine and Biology Magazine, Special Issue on Data Mining andKnowledge Discovery in Medical Data. This special issue provides the latest developments in the application of datamining methods for discovering of medical knowledge.Michael J. A. Berry, Gordon Linoff (2000). Mastering data mining: the art and science of customer relationshipmanagement, New York, Chichester: Wiley Computer Publishing. This book is devoted to one of the hottestspecialized applications of data mining – customer relationship management.Kovalerchuk B. and Vityaev E. (2000), Data Mining in Finance: Advances in Relational and Hybrid Methods, KluwerAcademic. This book is focused on financial data mining (requires good mathematical background)The UTS Coursework Assessment Policy & Procedure Manual, at resourcesSubject announcements, the topic discussion boards for the subject and other communication tools will in UTS Online.You can enter UTS Online at attribute developmentFor a full list of the faculty’s graduate attributes, refer to the FEIT Graduate Attributes webpage.27/06/2019 (Spring 2019) © University of Technology Sydney Page 6 of 8Assessment: faculty procedures and adviceExtensionsWhen, due to extenuating circumstances, you are unable to submit or present an assessment task on time, pleasecontact your subject coordinator before the assessment task is due to discuss an extension. Extensions may begranted up to a maximum of 5 days (120 hours). In all cases you should have extensions confirmed in writing.Special considerationIf you believe your performance in an assessment item or exam has been adversely affected by circumstancesbeyond your control, such as a serious illness, loss or bereavement, hardship, trauma, or exceptional employmentdemands, you may be eligible to apply for Special Consideration.Late penaltyWork submitted late without an approved extension is subject to a late penalty of 10 per cent of the total availablemarks deducted per calendar day that the assessment is overdue (e.g. if an assignment is out of 40 marks, and issubmitted (up to) 24 hours after the deadline without an extension, the student will have four marks deducted fromtheir awarded mark). Work submitted after five calendar days is not accepted and a mark of zero is awarded.For some assessment tasks a late penalty may not be appropriate – these are clearly indicated in the subject outline.Such assessments receive a mark of zero if not completed by/on the specified date. Examples include:a. weekly online tests or laboratory work worth a small proportion of the subject mark, orb. online quizzes where answers are released to students on completion, orprofessional assessment tasks, where the intention is to create an authentic assessment that has an absolutesubmission date, orc.d. take-home papers that are assessed during a defined time period, ore. pass/fail assessment tasks.Querying resultsIf students wish to query their result in an individual assessment task or the final examination, the process to followcan be found at Querying a mark or grade. The deadline is five working days from the date of release of the result.If students wish to query their final overall result in a subject, they may request a review of final subject assessmentresult. The deadline is five working days from the date of release of the result.Academic liaison officerAcademic liaison officers (ALOs) are academic staff in each faculty who assist students experiencing difficulties intheir studies due to: disability and/or an ongoing health condition; carer responsibilities (e.g. being a primary carer forsmall children or a family member with a disability); and pregnancy.ALOs are responsible for approving adjustments to assessment arrangements for students in these categories.Students who require adjustments due to disability and/or an ongoing health condition are requested to discuss theirsituation with an accessibility consultant at the Accessibility Service before speaking to the relevant ALO.The ALO for undergraduate students is:Brian Tuckertelephone +61 2 9514 2627The ALO for postgraduate students is:Dr Nham Trantelephone +61 2 9514 4468Statement about assessment procedures and adviceThis subject outline must be read in conjunction with the policy and procedures for the assessment for courseworksubjects, available at: on copyrightTeaching materials and resources provided to you at UTS are protected by copyright. You are not permitted to re-usethese for commercial purposes (including in kind benefit or gain) without permission of the copyright owner. Improper27/06/2019 (Spring 2019) © University of Technology Sydney Page 7 of 8these for commercial purposes (including in kind benefit or gain) without permission of the copyright owner. Improperor illegal use of teaching materials may lead to prosecution for copyright infringement.Statement on plagiarismPlagiarism and academic integrityAt UTS, plagiarism is defined in Rule 16.2.1(4) as: ‘taking and using someone else’s ideas or manner of expressingthem and passing them off as … [their] own by failing to give appropriate acknowledgement of the source to seek togain an advantage by unfair means’.The definition infers that if a source is appropriately referenced, the student’s work will meet the required academicstandard. Plagiarism is a literary or an intellectual theft and is unacceptable both academically and professionally. Itcan take a number of forms including but not limited to:copying any section of text, no matter how brief, from a book, journal, article or other written source without dulyacknowledging the sourcecopying any map, diagram, table or figure without duly acknowledging the sourceparaphrasing or otherwise using the ideas of another author without duly acknowledging the sourcere-using sections of verbatim text without using quote marks to indicate the text was copied from the source (even ifa reference is given).Other breaches of academic integrity that constitute cheating include but are not limited to:submitting work that is not a student’s own, copying from another student, recycling another student’s work,recycling previously submitted work, and working with another student in the same cohort in a manner that exceedsthe boundaries of legitimate cooperationpurchasing an assignment from a website and submitting it as original workrequesting or paying someone else to write original work, such as an assignment, essay or computer program, andsubmitting it as original work.Students who condone plagiarism and other breaches of academic integrity by allowing their work to be copied arealso subject to student misconduct Rules.Where proven, plagiarism and other breaches of misconduct are penalised in accordance with UTS Student RulesSection 16 – Student misconduct and appeals.Avoiding plagiarism is one of the main reasons why the Faculty of Engineering and IT is insistent on the thorough andappropriate referencing of all written work. Students may seek assistance regarding appropriate referencing throughUTS: HELPS.Work submitted electronically may be subject to similarity detection software. Student work must be submitted in aformat able to be assessed by the software (e.g. doc, pdf (text files), rtf, html).Further information about avoiding plagiarism at UTS is available.Retention of student workThe University reserves the right to retain the original or one copy of any work executed and/or submitted by a studentas part of the course including, but not limited to, drawings, models, designs, plans and specifications, essays,programs, reports and theses, for any of the purposes designated in Student Rule 3.9.2. Such retention is not to affectany copyright or other intellectual property right that may exist in the student’s work. Copies of student work may beretained for a period of up to five years for course accreditation purposes. Students are advised to contact their subjectcoordinator if they do not consent to the University retaining a copy of their work.Statement on UTS email accountEmail from the University to a student will only be sent to the student’s UTS email address. Email sent from a studentto the University must be sent from the student’s UTS email address. University staff will not respond to email fromany other email accounts for currently enrolled students.27/06/2019 (Spring 2019) © University of Technology Sydney Page 8 of 8

Assignment status: Already Solved By Our Experts

(USA, AUS, UK & CA  PhD. Writers)