subject

This assignment is to load data from CSV, populate them in a SQLite database, and run in-database analytics with SQL. Data Source
The data is given in the CSV format, available at: (link to .csv file removed but this is a sample:)
Only 11 rows of a 30k row csv
ID LIMIT_BAL SEX EDUCATION MARRIAGE AGE PAY_0 PAY_2 PAY_3 PAY_4 PAY_5 PAY_6 BILL_AMT1 BILL_AMT2 BILL_AMT3 BILL_AMT4 BILL_AMT5 BILL_AMT6 PAY_AMT1 PAY_AMT2 PAY_AMT3 PAY_AMT4 PAY_AMT5 PAY_AMT6 default. payment. next. month
1 20000 2 2 1 24 2 2 -1 -1 -2 -2 3913 3102 689 0 0 0 0 689 0 0 0 0 1
2 120000 2 2 2 26 -1 2 0 0 0 2 2682 1725 2682 3272 3455 3261 0 1000 1000 1000 0 2000 1
3 90000 2 2 2 34 0 0 0 0 0 0 29239 14027 13559 14331 14948 15549 1518 1500 1000 1000 1000 5000 0
4 50000 2 2 3 37 0 0 0 0 0 0 46990 48233 49291 28314 28959 29547 2000 2019 1200 1100 1069 1000 0
5 50000 1 2 1 57 -1 0 -1 0 0 0 -8617 5670 35835 20940 19146 19131 2000 36681 10000 9000 689 679 0
According to the site:
This dataset contains information on default payments, demographic factors, credit data, history of payment, and bill statements of credit card clients in Taiwan from April 2005 to September 2005.
Write Python code with the SQLite module to do the following.
SQLite Database Setup
Create a new database, connect to it, and create a table structure corresponding to data fields in the CSV data source.
Load Data into SQLite
Read data from the CSV file, use a loop to read each CSV line (data instance), and insert it into the SQLite table.
In-database Query and Analytics with SQLite Data
Code in Python with related SQL statements to do the following:
Update the data so that marriage=2 (single) and marriage=3 (others) are merged into 2 (single).
Remove all data records with negative BILL_AMT values (in any of the BILL_AMT1 through BILL_AMT6;
Select and show the first 10 records in the database table, using SELECT … LIMIT… ;
Select and show all records with a BILL_AMT1 amount greater than 500k;
Compute the total number of records, average AGE, min LIMIT_BAL, max LIMIT_BAL in the data;
Count the # records, average AGE, min LIMIT_BAL, max LIMIT_BAL for default. payment. next. month=0 (no default) vs. default. payment. next. month=1 (default), using GROUP BY;
Count the # records, average AGE, min LIMIT_BAL, max LIMIT_BAL for each marriage group (1, 2), again using GROUP BY;
Count the # records in each marriage group who will default (1) vs. not default (0).
Next step would be using Python with MongoDB:
To write Python code to load the same as above into MongoDB and conduct the same analysis above. -

ansver
Answers: 2

Another question on Computers and Technology

question
Computers and Technology, 22.06.2019 04:30
Ryan is working on the layout of her web page. she needs to figure out where her header, navigation bar, text, and images should go. what technique can her?
Answers: 1
question
Computers and Technology, 22.06.2019 21:30
Im doing this last minute and literally none of my neighbors or people that my dad works with use excel so if anyone could me make up an example
Answers: 1
question
Computers and Technology, 24.06.2019 13:20
3. ranga ramasesh is the operations manager for a firm that is trying to decide which one of four countries it should research for possible outsourcing providers. the first step is to select a country based on cultural risk factors, which are critical to eventual business success with the provider. ranga has reviewed outsourcing provider directories and found that the four countries in the table that follows have an ample number of providers from which they can choose. to aid in the country selection step, he has enlisted the aid of a cultural expert, john wang, who has provided ratings of the various criteria in the table. the resulting ratings are on a 1 to 10 scale, where 1 is a low risk and 10 is a high risk. john has also determined six criteria weightins: trust, with a weight of 0.3; quality, with 0.2; religious, with 0.1; individualism, with 0.2; time, with 0.1; and uncertainity, with 0.1. using the factor-rating method, which country should ranga select? why? (2 points)
Answers: 3
question
Computers and Technology, 25.06.2019 08:20
The binary numbering system uses only two symbols—the digits 0 and 1—to represent all possible numbers. - true or false
Answers: 3
You know the right answer?
This assignment is to load data from CSV, populate them in a SQLite database, and run in-database an...
Questions
question
Mathematics, 03.06.2020 20:00
question
English, 03.06.2020 20:01
question
Mathematics, 03.06.2020 20:01
question
Mathematics, 03.06.2020 20:01
Questions on the website: 13722362