Как реализовать магию Sqoop для загрузки данных через Spark

от автора

Очень часто приходится слышать, что Sqoop — это серебряная пуля для загрузки данных большого объёма с реляционных БД в Hadoop, особенно с Oracle, и Spark-ом невозможно достигнуть такой производительности. При этом приводят аргументы, что sqoop — это инструмент, заточенный под загрузку, а Spark предназначен для обработки данных.

Меня зовут Максим Петров, я руководитель департамента «Чаптер инженеров данных и разработчиков», и я решил написать инструкцию о том, как правильно и быстро загружать данные Spark, основываясь на принципах загрузки Sqoop.

Первичное сравнение технологий

В нашем примере будем рассматривать загрузку данных из таблиц OracleDB.

Рассмотрим случай, когда нам необходимо полностью перегрузить таблицу/партицию на кластер Hadoop c созданием метаданных hive.

Осторожно, много кода
-- DWH.SERVICE_FEATURE definition   CREATE TABLE "DWH"."SERVICE_FEATURE"    (    "BAN" NUMBER(9,0) NOT NULL ENABLE,     "SUBSCRIBER_NO" CHAR(11) NOT NULL ENABLE,     "SOC" CHAR(9) NOT NULL ENABLE,     "SOC_SEQ_NO" NUMBER(9,0) NOT NULL ENABLE,     "SERVICE_FTR_SEQ_NO" NUMBER(9,0) NOT NULL ENABLE,     "SYS_CREATION_DATE" DATE NOT NULL ENABLE,     "SYS_UPDATE_DATE" DATE,     "OPERATOR_ID" NUMBER(9,0),     "APPLICATION_ID" CHAR(6),     "DL_SERVICE_CODE" CHAR(5),     "DL_UPDATE_STAMP" NUMBER(4,0),     "SOC_EFFECTIVE_DATE" DATE NOT NULL ENABLE,     "CUSTOMER_ID" NUMBER(9,0) NOT NULL ENABLE,     "FEATURE_CODE" CHAR(6) NOT NULL ENABLE,     "SERVICE_TYPE" CHAR(1) NOT NULL ENABLE,     "SOC_LEVEL_CODE" CHAR(1) NOT NULL ENABLE,     "FTR_EFFECTIVE_DATE" DATE NOT NULL ENABLE,     "FTR_EFF_RSN_CODE" CHAR(1),     "FTR_EXPIRATION_DATE" DATE,     "FTR_EXP_RSN_CODE" CHAR(1),     "RC_WAIVER_EFF_DATE" DATE,     "RC_WAIVER_EXPR_DATE" DATE,     "RC_WAIVER_RSN" CHAR(6),     "ADDITIONAL_INFO_AMT" NUMBER(11,2),     "ADDITIONAL_INFO_TYPE" CHAR(1),     "ADDITIONAL_INFO" VARCHAR2(500),     "CHARGE_LEVEL_CODE" CHAR(1),     "BEN" NUMBER(5,0),     "REVENUE_CODE" CHAR(3),     "SOC_RELATED" CHAR(9),     "BL_PROM_SORT_CODE" NUMBER(2,0),     "SERVICE_CLASS" CHAR(3),     "RATE_CODE" CHAR(9),     "RC_EXPIRATION_DATE" DATE,     "RC_ADVPYM_WAIVE_EFF_DT" DATE,     "RC_ADVPYM_WAIVE_EXPR_DT" DATE,     "RC_ADVPYM_WAIVE_RSN" CHAR(6),     "RC_ADVPYM_BILL_SEQ_NO" NUMBER(3,0),     "RENEWAL_RATE_CODE" CHAR(9),     "SECONDARY_TN" CHAR(11),     "CONV_RUN_NO" NUMBER(3,0),     "PORT_IN_START_DATE" DATE,     "PORT_IN_END_DATE" DATE    ) PCTFREE 10 PCTUSED 40 INITRANS 7 MAXTRANS 255  NOCOMPRESS  LOGGING   STORAGE(   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA"   PARTITION BY RANGE ("BAN")  (PARTITION "SERVICE_FEATURE_1"  VALUES LESS THAN (18424929)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_2"  VALUES LESS THAN (43933393)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_3"  VALUES LESS THAN (65938220)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_4"  VALUES LESS THAN (78030882)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_5"  VALUES LESS THAN (89426107)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_6"  VALUES LESS THAN (99364535)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_7"  VALUES LESS THAN (105756681)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_8"  VALUES LESS THAN (111776927)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_9"  VALUES LESS THAN (117780453)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_10"  VALUES LESS THAN (123780786)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_11"  VALUES LESS THAN (129789421)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_12"  VALUES LESS THAN (135790871)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_13"  VALUES LESS THAN (141825258)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_14"  VALUES LESS THAN (147827477)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_15"  VALUES LESS THAN (153829228)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_16"  VALUES LESS THAN (159835744)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_17"  VALUES LESS THAN (165838495)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_18"  VALUES LESS THAN (171850405)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_19"  VALUES LESS THAN (177898201)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_20"  VALUES LESS THAN (183970359)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_21"  VALUES LESS THAN (190047208)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_22"  VALUES LESS THAN (196075759)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_23"  VALUES LESS THAN (202078088)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_24"  VALUES LESS THAN (208151076)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_25"  VALUES LESS THAN (345838310)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_26"  VALUES LESS THAN (351838775)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_27"  VALUES LESS THAN (357839253)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_28"  VALUES LESS THAN (363851043)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_29"  VALUES LESS THAN (369857876)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_30"  VALUES LESS THAN (375861106)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_31"  VALUES LESS THAN (381863539)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_32"  VALUES LESS THAN (387914806)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_33"  VALUES LESS THAN (393921028)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_34"  VALUES LESS THAN (399926438)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_35"  VALUES LESS THAN (405929686)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_36"  VALUES LESS THAN (411936420)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_37"  VALUES LESS THAN (417974400)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_38"  VALUES LESS THAN (423985227)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_39"  VALUES LESS THAN (429992484)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_40"  VALUES LESS THAN (436183736)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_41"  VALUES LESS THAN (442183736)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_42"  VALUES LESS THAN (448183736)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_43"  VALUES LESS THAN (454183736)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_44"  VALUES LESS THAN (460183736)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_45"  VALUES LESS THAN (466183736)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_46"  VALUES LESS THAN (472183736)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_47"  VALUES LESS THAN (478183736)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_48"  VALUES LESS THAN (484183736)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_49"  VALUES LESS THAN (490183736)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_50"  VALUES LESS THAN (496183736)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_51"  VALUES LESS THAN (502183736)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_52"  VALUES LESS THAN (508183736)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_53"  VALUES LESS THAN (514183736)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_54"  VALUES LESS THAN (520183736)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_55"  VALUES LESS THAN (526183736)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_56"  VALUES LESS THAN (532183736)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_57"  VALUES LESS THAN (538183736)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_58"  VALUES LESS THAN (544183736)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_59"  VALUES LESS THAN (550183736)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_60"  VALUES LESS THAN (556183736)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_61"  VALUES LESS THAN (562183736)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_62"  VALUES LESS THAN (568183736)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_63"  VALUES LESS THAN (574183736)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_64"  VALUES LESS THAN (580183736)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_65"  VALUES LESS THAN (586183736)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_66"  VALUES LESS THAN (592183736)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_67"  VALUES LESS THAN (598183736)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_68"  VALUES LESS THAN (604183736)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_69"  VALUES LESS THAN (610183736)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_70"  VALUES LESS THAN (616183736)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_71"  VALUES LESS THAN (622183736)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_72"  VALUES LESS THAN (628183736)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_73"  VALUES LESS THAN (634183736)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_74"  VALUES LESS THAN (640183736)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_75"  VALUES LESS THAN (646183736)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_76"  VALUES LESS THAN (652183736)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_77"  VALUES LESS THAN (658183736)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_78"  VALUES LESS THAN (664183736)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_79"  VALUES LESS THAN (670183736)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_80"  VALUES LESS THAN (676183736)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_81"  VALUES LESS THAN (682183736)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_82"  VALUES LESS THAN (688183736)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_83"  VALUES LESS THAN (694183736)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC NOLOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" ,  PARTITION "SERVICE_FEATURE_MAX"  VALUES LESS THAN (MAXVALUE)   NO INMEMORY SEGMENT CREATION IMMEDIATE   PCTFREE 0 PCTUSED 40 INITRANS 7 MAXTRANS 255  COMPRESS BASIC LOGGING   STORAGE(INITIAL 8388608 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645   PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1   BUFFER_POOL KEEP FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)   TABLESPACE "DATA" )   PARALLEL 8 ; 

Будем загружать самую большую партицию SERVICE_FEATURE_MAX (14 млрд строк)

Загрузка с помощью Sqoop будет выглядеть следующим образом:

sqoop import \ --num-mappers 8 \ --fetch-size 100000 \ -Dmapreduce.task.timeout=0 \ -D mapreduce.map.memory.mb=16384 \ -Doraoop.import.partitions='"SERVICE_FEATURE_MAX"' \ --hcatalog-database customer360_stg \ --hcatalog-table customer_oracle_sqoop \ --connect jdbc:oracle:thin:@//10.31.178.66:1521/ORCL  \ --username user \ --password password \ --table DWH.SERVICE_FEATURE \ --drop-and-create-hcatalog-table \ --hcatalog-storage-stanza 'stored as orcfile' \ --direct 

Потребляемые ресурсы: 9 Cores, 144GB RAM

Время загрузки: ~ 26 часов (14 часов, если загружать в текстовом формате хранения)

Теперь попробуем загрузить эту же таблицу через Spark.

Будем сразу же грузить в 8 потоков, как в Sqoop.

Применив способ, который описан в документации по Spark, и взяв псевдостолбец ROWNUM за поле, по которому будем делить таблицу (можно использовать другое поле, по которому есть равномерное распределение и можно разделить на порции), мы получим вот такой код:

//Получаем максимальное значение rownum val max = spark.   read.   format("jdbc").   option("url", "jdbc:oracle:thin:@//10.31.178.66:1521/ORCL").   option("user", "user").   option("password", "password").   option("driver", "oracle.jdbc.driver.OracleDriver").   option("dbtable", s"( select max(rownum) as row_num FROM DWH.SERVICE_FEATURE PARTITION (SERVICE_FEATURE))").   load().   collect().   map(_.getDecimal(0)).   head.toBigInteger   //Загружаем данные val etlDF = spark.   read.   format("jdbc").   option("url", "jdbc:oracle:thin:@//10.31.178.66:1521/ORCL").   option("user", "user").   option("password", "password").   option("driver", "oracle.jdbc.driver.OracleDriver").   option("dbtable", "( select a.*, rownum as row_num FROM DWH.SERVICE_FEATURE PARTITION (SERVICE_FEATURE) a)" ).   option("partitionColumn", "row_num").   option("upperBound", max.toString().toInt).   option("lowerBound", 0).   option("numPartitions", 8).   option("fetchSize", "10000").   option("driver", "oracle.jdbc.driver.OracleDriver").   load()   //сохраняем etlDF.   write.   mode(SaveMode.Overwrite).   format("orc").   saveAsTable("customer360_stg.customer_oracle_spark") 

Потребляемые ресурсы: 9 Cores, 132GB RAM

Время загрузки: Принудительно завершил загрузку после 35 часов

Почему так происходит?

После таких результатов в основном делают выводы, что сравнение загрузок через Sqoop и Spark завершено и предпочтение надо отдавать Sqoop.

Давайте попробуем разобраться, в чём же магия загрузки через Sqoop, ведь и Sqoop, и Spark используют JDBC-соединение (которое никак не кастомизировалось, кроме как fetchSize), с одинаковым количеством сессий. Но при этом Sqoop использует устаревшую технологию MapReduce, а Spark все операции проделывает в оперативной памяти, что должно сказаться на производительности при преобразовании данных в формат ORC. 

Но это не так, так как в нашем случае при загрузке через Spark все 8 потоков делают fullscan партиции, отфильтруют нужные значения и отправят их на executor, а так как объёмы данных большие, то они не будут помещаться в память и будет происходить spill to disk и много еще других операций, которые будут только увеличивать время загрузки. Чтобы избавится от спила, необходимо увеличивать число параллелей, а это увеличивает число сессий и фулсканов таблицы.

И для того, чтобы начать эффективно загружать таблицу, необходимо разобраться, почему Sqoop не делает fullscan для каждой сессии, а знает, что ему прочитать с максимальной эффективностью, так, что каждая сессия знает, какой ей блок прочитать. О блока, экстентах, сегментах можно прочитать вот здесь.

Если запустить загрузку через Sqoop и посмотреть, какие запросы делает загрузчик к базе данных, то там можно обнаружить вот такой интересный запрос:

    SELECT       data_object_id,       file_id,       relative_fno,       file_batch,       MIN (start_block_id) start_block_id,       MAX (end_block_id) end_block_id,       SUM (blocks) blocks         FROM           (SELECT              o.data_object_id,              e.file_id,              e.relative_fno,              e.block_id start_block_id,              e.block_id + e.blocks - 1 end_block_id,              e.blocks,              CEIL (SUM(e.blocks) OVER (PARTITION BY o.data_object_id, e.file_id ORDER BY e.block_id ASC) /                (SUM (e.blocks)OVER (PARTITION BY o.data_object_id,e.file_id) / 21)) file_batch           FROM             dba_extents e,             dba_objects o,             dba_tab_subpartitions tsp           WHERE             o.owner = 'DWH'             AND o.object_name = 'SERVICE_FRATURE'             AND e.owner = 'DWH'             AND e.segment_name = 'SERVICE_FRATURE'             AND o.owner = e.owner             AND o.object_name = e.segment_name             AND (o.subobject_name = e.partition_name               OR (o.subobject_name IS NULL                 AND e.partition_name IS NULL))             AND o.owner = tsp.table_owner(+)             AND o.object_name = tsp.table_name(+)             AND o.subobject_name = tsp.subpartition_name(+))     GROUP BY       data_object_id,       file_id,       relative_fno,       file_batch     ORDER BY data_object_id,       file_id,       relative_fno,       file_batch 

Результат этого запроса будет выглядеть следующим образом:

Посмотрим, какую информацию выводит запрос, обратившись к документации Oracle:

Column

Description

Comment

1

DATA_OBJECT_ID

Dictionary object number of the segment that contains the object

 

2

FILE_ID

File identifier number of the file containing the extent

 

3

RELATIVE_FNO

Relative file number of the first extent block

 

4

FILE_BATCH

Number оf the batch in file

 

5

START_BLOCK_ID

Starting block number of the extent

 

6

END_BLOCK_ID

Block number of the extent

START_BLOCK_ID + BLOCKS — 1

7

BLOCKS

Size of the extent in Oracle blocks

 

8

SUBOBJECT_NAME

Name of the subobject (for example, partition)

Поле в выводе не участвует, но в дальнейшем оно нам понадобится

На основании полученных данных Sqoop строит свои запросы для выгрузки, которые выглядят так:

SELECT /*+ NO_INDEX(t) */ CUSTOMER_ID,SYS_CREATION_DATE,SYS_UPDATE_DATE,OPERATOR_ID,APPLICATION_ID,DL_SERVICE_CODE,DL_UPDATE_STAMP,CONTACT_TELNO,CONTACT_TN_EXTNO,CONTACT_FAXNO,CONTACT_COUNTRY_CODE,WORK_TELNO,WORK_TN_EXTNO,HOME_TELNO,OTHER_TELNO,OTHER_EXTNO,OTHER_TN_TYPE,BIRTH_DATE,EMPLOYMENT_DT,EMPLYR_NAME,EMPLOYEE_QTY,YEARS_IN_BUSINESS,EMPLOYEE_POSITION,BUSINESS_EST_DATE,MARKET_ID,GUR_CR_CARD_TYPE,GUR_CR_CARD_NO,GUR_CR_CARD_EXP_DT,ACC_PASSWORD,VERIFIED_DATE,COMPANY_REGISTER_NUM,BUSINESS_CATEGORY,CONV_RUN_NO,LEGACY_CST_NUM,E_MAIL_ADDR,MAILSHOT,IMEI_BLACK_LIST_PASS,COMPANY_TYPE,CUST_GENDER,CUST_PERSONAL_ID,CUST_NATIONALITY,CUST_DOC_TYPE,CUST_DOC_ID,CUST_DOC_NO,CUST_DOC_DATE,CUST_DOC_ISSUE,EMP_NO,AUTPRSN_BIRTH_DATE,AUTPRSN_PERSONAL_ID,GUARANTOR_TYPE,GUR_PERSONAL_ID,GUR_COMPANY_REG_NUM,START_TIME_AT_BANK,GUR_BIRTH_DATE,GUR_TAX_NUMBER,REGISTER_IND,CONTACT_TELNO_2ND,CONTACT_TN_EXTNO_2ND,CONTACT_FAXNO_2ND,CONTACT_COUNTRY_CODE_2ND,CREDIT_CARD_IND,AUTHORIZED_SEC,LEGAL_NAME_FORM,'1024_4' data_chunk_id  FROM DWH.CUSTOMER  t  WHERE ((rowid >= dbms_rowid.rowid_create(1, 510042, 1024, 127475712, 0) AND rowid <= dbms_rowid.rowid_create(1, 510042, 1024, 127696895, 32767))) UNION ALL  SELECT /*+ NO_INDEX(t) */ CUSTOMER_ID,SYS_CREATION_DATE,SYS_UPDATE_DATE,OPERATOR_ID,APPLICATION_ID,DL_SERVICE_CODE,DL_UPDATE_STAMP,CONTACT_TELNO,CONTACT_TN_EXTNO,CONTACT_FAXNO,CONTACT_COUNTRY_CODE,WORK_TELNO,WORK_TN_EXTNO,HOME_TELNO,OTHER_TELNO,OTHER_EXTNO,OTHER_TN_TYPE,BIRTH_DATE,EMPLOYMENT_DT,EMPLYR_NAME,EMPLOYEE_QTY,YEARS_IN_BUSINESS,EMPLOYEE_POSITION,BUSINESS_EST_DATE,MARKET_ID,GUR_CR_CARD_TYPE,GUR_CR_CARD_NO,GUR_CR_CARD_EXP_DT,ACC_PASSWORD,VERIFIED_DATE,COMPANY_REGISTER_NUM,BUSINESS_CATEGORY,CONV_RUN_NO,LEGACY_CST_NUM,E_MAIL_ADDR,MAILSHOT,IMEI_BLACK_LIST_PASS,COMPANY_TYPE,CUST_GENDER,CUST_PERSONAL_ID,CUST_NATIONALITY,CUST_DOC_TYPE,CUST_DOC_ID,CUST_DOC_NO,CUST_DOC_DATE,CUST_DOC_ISSUE,EMP_NO,AUTPRSN_BIRTH_DATE,AUTPRSN_PERSONAL_ID,GUARANTOR_TYPE,GUR_PERSONAL_ID,GUR_COMPANY_REG_NUM,START_TIME_AT_BANK,GUR_BIRTH_DATE,GUR_TAX_NUMBER,REGISTER_IND,CONTACT_TELNO_2ND,CONTACT_TN_EXTNO_2ND,CONTACT_FAXNO_2ND,CONTACT_COUNTRY_CODE_2ND,CREDIT_CARD_IND,AUTHORIZED_SEC,LEGAL_NAME_FORM,'1024_12' data_chunk_id  FROM DWH.CUSTOMER  t  WHERE ((rowid >= dbms_rowid.rowid_create(1, 510042, 1024, 129204224, 0) AND rowid <= dbms_rowid.rowid_create(1, 510042, 1024, 129417215, 32767))) 

Таких запросов будет столько, сколько вами указано потоков загрузки. У этих запросов есть один минус – бывает, что количество FILE_BATCH в десятки раз превышает число потоков, тем самым увеличивая количество UNION для одной сессии, и так как этот запрос выполняется на стороне Oracle, то это сказывается на производительности базы.

Запрос ничем не отличается от обычного селекта, который мы делали в нашей первой реализации Spark выгрузки, кроме того, по какому полю он фильтрует данные — rowid

Реализация быстрой загрузки с помощью Spark

Опираясь на полученные знания о том, как строит свои запросы Sqoop, и проблемы, с которыми мы можем столкнуться при его использовании, напишем свою реализацию этого подхода выгрузки.

Пререквезиты:

Для ТУЗ, из-под которой будем делать загрузку, должны быть следующие права (этого требует и Sqoop):

grant select on v_$instance to user; grant select on dba_tables to user; grant select on dba_tab_columns to user; grant select on dba_objects to user; grant select on dba_extents to user; grant select on dba_segments to user; grant select on dba_constraints to user; grant select on v_$database to user; grant select on v_$parameter to user; grant select on dba_tab_subpartitions to user; 

Так как мы хотим написать переиспользуемое приложение, то параметризируем его:

  //количество потоков   val cntSession = 8       //на сколько порций разделим нашу загрузку - аналог FILE_BATCH у Sqoop   val cntChunk = 512     //параметр JDBC подключения   val jdbcUrl = "jdbc:oracle:thin:@//10.31.178.66:1521/ORCL"   val fetchSize = 100000    val orclUsr = "user"   val orclPwd = "password"       //метаданные для источника   val orclSchema = "DWH"   val orclTable = "SERVICE_FEATURE"   val orclPartition = "SERVICE_FEATURE_MAX"     //метаданные для приёмника   val hivePath = "/warehouse/tablespace/external/hive/customer360_stg.db/service_feature_orcl"   val hiveTable = "customer360_stg.service_feature_orcl"   val trgNumFiles = 400 

Реализуем получение метаинформации для наших чанков, добавив в запрос Sqoop дополнительную информацию о имени партиции, чтобы можно было выгружать отдельные партиции.

  val etlDFChunkAll = spark.     read.     format("jdbc").     option("url", jdbcUrl).     option("user", orclUsr).     option("password", orclPwd).     option("driver", "oracle.jdbc.driver.OracleDriver").     option("dbtable",       s"""(     SELECT       data_object_id,       file_id,       relative_fno,       file_batch,       subobject_name,       MIN (start_block_id) start_block_id,       MAX (end_block_id) end_block_id,       SUM (blocks) blocks         FROM           (SELECT              o.data_object_id,              o.subobject_name,              e.file_id,              e.relative_fno,              e.block_id start_block_id,              e.block_id + e.blocks - 1 end_block_id,              e.blocks,              CEIL (SUM(e.blocks) OVER (PARTITION BY o.data_object_id, e.file_id ORDER BY e.block_id ASC) /                (SUM (e.blocks)OVER (PARTITION BY o.data_object_id,e.file_id) / $cntChunk)) file_batch           FROM             dba_extents e,             dba_objects o,             dba_tab_subpartitions tsp           WHERE             o.owner = '$orclSchema'             AND o.object_name = '$orclTable'             AND e.owner = '$orclSchema'             AND e.segment_name = '$orclTable'             AND o.owner = e.owner             AND o.object_name = e.segment_name             AND (o.subobject_name = e.partition_name               OR (o.subobject_name IS NULL                 AND e.partition_name IS NULL))             AND o.owner = tsp.table_owner(+)             AND o.object_name = tsp.table_name(+)             AND o.subobject_name = tsp.subpartition_name(+))     GROUP BY       data_object_id,       file_id,       relative_fno,       file_batch,       subobject_name     ORDER BY data_object_id,       file_id,       relative_fno,       file_batch,       subobject_name)""").     load() 

Определяем, если надо выгрузить только партицию.

  val etlDFChunk =     if (orclPartition == "")       etlDFChunkAll     else etlDFChunkAll.       filter($"subobject_name" === orclPartition) 

Получаем список полей (админы Oracle запрещают выгружать по *)

  val lstColumn = spark.     read.     format("jdbc").     option("url", jdbcUrl).     option("user", orclUsr).     option("password", orclPwd).     option("driver", "oracle.jdbc.driver.OracleDriver").     option("dbtable", s"( SELECT * FROM $orclSchema.$orclTable WHERE 1=0 )").     load().     columns.     toList 

Реализуем функцию, которая генерирует запрос, и получим список сгруппированных запросов.

Каждая группа содержит столько запросов, сколько у нас должно быть потоков (8), чтобы каждая группа выполнялась в параллель.

  def mkQuery(lstColumn: List[String]) = udf((data_object_id: Integer,                                               relative_fno: Integer,                                               start_block_id: Integer,                                               end_block_id: Integer) =>     s"""SELECT /*+ NO_INDEX(t) */           ${lstColumn.mkString(", ")}          FROM            $orclSchema.$orclTable          WHERE            ((rowid >= dbms_rowid.rowid_create(1, $data_object_id, $relative_fno, $start_block_id, 0)              AND rowid <= dbms_rowid.rowid_create(1, $data_object_id, $relative_fno, $end_block_id, 32767)))""")     val lstQueries = etlDFChunk.     withColumn("query",       mkQuery(lstColumn)($"data_object_id" cast (IntegerType),         $"relative_fno" cast (IntegerType),         $"start_block_id" cast (IntegerType),         $"end_block_id" cast (IntegerType))).     select($"relative_fno", $"file_batch" cast (IntegerType), $"query").     collect().     map(row => s"""${row.getInt(0)}_${row.getInt(1)}""" -> row.getString(2)).toMap.grouped(cntSession).     toList 

Выполняем запросы в параллель, через параллельные коллекции Scala, записывая результаты во временную директорию.

  lstQueries.foreach{ lst =>     val lstPar = lst.par     lstPar.tasksupport = new ForkJoinTaskSupport(new ForkJoinPool(cntSession))     lstPar.foreach{query =>       spark.         read.         format("jdbc").         option("url", jdbcUrl).         option("user", orclUsr).         option("password", orclPwd).         option("driver", "oracle.jdbc.driver.OracleDriver").         option("dbtable", s"( ${query._2} )").         option("fetchSize", fetchSize).         load().         write.         mode(SaveMode.Overwrite).         orc(s"${hivePath}_tmp/part_${query._1}")     }   } 

Читаем полученный результат, объединяем, записываем в целевую таблицу, очищаем временную директорию.

  val mapQuery = lstQueries.flatten.toMap   val trgDf = mapQuery.map { query =>     spark.       read.       orc(s"${hivePath}_tmp/part_${query._1}")   }.     reduce(_ union (_))     trgDf.     coalesce(trgNumFiles).     write.     mode(SaveMode.Overwrite).     option("path", hivePath).     format("orc").     saveAsTable(hiveTable)     val confFs = spark.sparkContext.hadoopConfiguration   val fs = FileSystem.get(confFs)   val path = new Path(s"${hivePath}_tmp")     if (fs.exists(path)) fs.delete(path, true) } 

Результат работы:

Потребляемые ресурсы: 9 Cores, 132GB RAM

Время загрузки: 12 часов

Итоги

Как мы видим, всё-таки Spark можно, даже нужно использовать для загрузки данных, только стоит оптимально написать приложение и тогда результат будет на уровне Sqoop или лучше.

В этой статье рассмотрен один из вариантов того, как грузит Sqoop, возможно, есть и другие методы загрузки, которые он использует у себя под капотом. Но, используя такую методологию, всю эту магию можно реализовать в Spark. Возможно, вам это и не придётся делать и будет достаточно переиспользования кода этой статьи.

P.S. Код можно завернуть в jar, вынеся настройку через параметры, и использовать его в своих загрузках.


ссылка на оригинал статьи https://habr.com/ru/company/beeline/blog/679876/


Комментарии

Добавить комментарий

Ваш адрес email не будет опубликован. Обязательные поля помечены *