Input: 1) Load national database DB; 2) Response feature CU; 3) The number of respondents N;

Output: Target dataset T A t r

1. T A t r = [ ]

2. for k = 1 , 2 , , n do

3. Data types: Identify as categorical or continuous

4. Target variable: Identify as categorical or continuous

5. Apply data transformation when appropriate

6. for continuous variables do

7. Compute Correlation test

8. end for

9. for categorical variables do

10. Compute Chi-square test using Equation (5)

11. end for

12. Established knowledge: Identify as Positive, Negative, Neutral, Unstudied

13. Expertise’s claim: State in scientific manner

14. Simplicity in time and interpretability: Identify as yes or no

15. Practicability and Applicability: Identify as yes or no

16. end for

17. Obtain the Attribute vector of the kth respondent

18. return T A t r