{"id":280864,"date":"2016-11-13T07:50:04","date_gmt":"2016-11-13T04:50:04","guid":{"rendered":"http:\/\/savepearlharbor.com\/?p=280864"},"modified":"-0001-11-30T00:00:00","modified_gmt":"-0001-11-29T21:00:00","slug":"","status":"publish","type":"post","link":"https:\/\/savepearlharbor.com\/?p=280864","title":{"rendered":"\u0420\u0435\u0430\u043b\u0438\u0437\u0430\u0446\u0438\u044f \u043a\u043b\u0430\u0441\u0441\u0438\u0444\u0438\u043a\u0430\u0446\u0438\u0438 \u0442\u0435\u043a\u0441\u0442\u0430 \u0441\u0432\u0451\u0440\u0442\u043e\u0447\u043d\u043e\u0439 \u0441\u0435\u0442\u044c\u044e \u043d\u0430 keras"},"content":{"rendered":"<p>\u0420\u0435\u0447\u044c, \u043a\u0430\u043a \u043d\u0438 \u0441\u0442\u0440\u0430\u043d\u043d\u043e, \u043f\u043e\u0439\u0434\u0451\u0442 \u043e \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u044e\u0449\u0435\u043c \u0441\u0432\u0451\u0440\u0442\u043e\u0447\u043d\u0443\u044e \u0441\u0435\u0442\u044c \u043a\u043b\u0430\u0441\u0441\u0438\u0444\u0438\u043a\u0430\u0442\u043e\u0440\u0435 \u0442\u0435\u043a\u0441\u0442\u043e\u0432 (\u0432\u0435\u043a\u0442\u043e\u0440\u0438\u0437\u0430\u0446\u0438\u044f \u043e\u0442\u0434\u0435\u043b\u044c\u043d\u044b\u0445 \u0441\u043b\u043e\u0432 \u2014 \u044d\u0442\u043e \u0443\u0436\u0435 \u0434\u0440\u0443\u0433\u043e\u0439 \u0432\u043e\u043f\u0440\u043e\u0441). <a href=\"https:\/\/bitbucket.org\/alex43210\/pynlc\">\u041a\u043e\u0434, \u0442\u0435\u0441\u0442\u043e\u0432\u044b\u0435 \u0434\u0430\u043d\u043d\u044b\u0435 \u0438 \u043f\u0440\u0438\u043c\u0435\u0440\u044b \u0438\u0445 \u043f\u0440\u0438\u043c\u0435\u043d\u0435\u043d\u0438\u044f<\/a> \u2014 \u043d\u0430 bitbucket (\u0443\u043f\u0435\u0440\u0441\u044f \u0432 \u043e\u0433\u0440\u0430\u043d\u0438\u0447\u0435\u043d\u0438\u044f \u0440\u0430\u0437\u043c\u0435\u0440\u0430 \u043e\u0442 github \u0438 \u043f\u0440\u0435\u0434\u043b\u043e\u0436\u0435\u043d\u0438\u0435 \u043f\u0440\u0438\u043c\u0435\u043d\u0438\u0442\u044c Git Large File Storage (LFS), \u043f\u043e\u043a\u0430 \u043d\u0435 \u043e\u0441\u0438\u043b\u0438\u043b \u043f\u0440\u0435\u0434\u043b\u0430\u0433\u0430\u0435\u043c\u043e\u0435 \u0440\u0435\u0448\u0435\u043d\u0438\u0435).<\/p>\n<h2>\u041d\u0430\u0431\u043e\u0440\u044b \u0434\u0430\u043d\u043d\u044b\u0445<\/h2>\n<p>  <\/p>\n<p>\u0418\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u044b \u043a\u043e\u043d\u0432\u0435\u0440\u0442\u0438\u0440\u043e\u0432\u0430\u043d\u043d\u044b\u0435 \u043d\u0430\u0431\u043e\u0440\u044b: <a href=\"http:\/\/www.daviddlewis.com\/resources\/testcollections\/reuters21578\/\">http:\/\/www.daviddlewis.com\/resources\/testcollections\/reuters21578\/<\/a> (22000 \u0437\u0430\u043f\u0438\u0441\u0435\u0439), <a href=\"https:\/\/github.com\/watson-developer-cloud\/car-dashboard\/blob\/master\/training\/car_workspace.json\">https:\/\/github.com\/watson-developer-cloud\/car-dashboard\/blob\/master\/training\/car_workspace.json<\/a> (530 \u0437\u0430\u043f\u0438\u0441\u0435\u0439), <a href=\"https:\/\/github.com\/watson-developer-cloud\/natural-language-classifier-nodejs\/blob\/master\/training\/weather_data_train.csv\">https:\/\/github.com\/watson-developer-cloud\/natural-language-classifier-nodejs\/blob\/master\/training\/weather_data_train.csv<\/a> (50 \u0437\u0430\u043f\u0438\u0441\u0435\u0439). <i>\u041a\u0441\u0442\u0430\u0442\u0438, \u043d\u0435 \u043e\u0442\u043a\u0430\u0437\u0430\u043b\u0441\u044f \u0431\u044b \u043e\u0442 \u043f\u043e\u0434\u043a\u0438\u043d\u0443\u0442\u043e\u0433\u043e \u0432 \u043a\u043e\u043c\u043c\u0435\u043d\u0442\u044b\/\u041b\u0421 (\u043d\u043e \u043b\u0443\u0447\u0448\u0435 \u0442\u0430\u043a\u0438 \u0432 \u043a\u043e\u043c\u043c\u0435\u043d\u0442\u044b) \u043d\u0430\u0431\u043e\u0440\u0430 \u0442\u0435\u043a\u0441\u0442\u043e\u0432 \u043d\u0430 \u0440\u0443\u0441\u0441\u043a\u043e\u043c.<\/i><\/p>\n<h2>\u0423\u0441\u0442\u0440\u043e\u0439\u0441\u0442\u0432\u043e \u0441\u0435\u0442\u0438<\/h2>\n<p>  <\/p>\n<p>\u0417\u0430 \u043e\u0441\u043d\u043e\u0432\u0443 \u0432\u0437\u044f\u0442\u0430 \u043e\u0434\u043d\u0430 \u0440\u0435\u0430\u043b\u0438\u0437\u0430\u0446\u0438\u044f \u043e\u043f\u0438\u0441\u0430\u043d\u043d\u043e\u0439 \u0442\u0443\u0442 \u0441\u0435\u0442\u0438: <a href=\"https:\/\/arxiv.org\/abs\/1408.5882\">https:\/\/arxiv.org\/abs\/1408.5882<\/a>. \u041a\u043e\u0434 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u043d\u043e\u0439 \u0440\u0435\u0430\u043b\u0438\u0437\u0430\u0446\u0438\u0438 \u043d\u0430 <a href=\"https:\/\/github.com\/alexander-rakhlin\/CNN-for-Sentence-Classification-in-Keras\">https:\/\/github.com\/alexander-rakhlin\/CNN-for-Sentence-Classification-in-Keras<\/a>.<br \/>  \u0412 \u043c\u043e\u0451\u043c \u0441\u043b\u0443\u0447\u0430\u0435 \u2014 \u043d\u0430 \u0432\u0445\u043e\u0434\u0435 \u0441\u0435\u0442\u0438 \u043d\u0430\u0445\u043e\u0434\u044f\u0442\u0441\u044f \u0432\u0435\u043a\u0442\u043e\u0440\u044b \u0441\u043b\u043e\u0432 (\u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0430 gensim-\u044f \u0440\u0435\u0430\u043b\u0438\u0437\u0430\u0446\u0438\u044f word2vec). \u0421\u0442\u0440\u0443\u043a\u0442\u0443\u0440\u0430 \u0441\u0435\u0442\u0438 \u0438\u0437\u043e\u0431\u0440\u0430\u0436\u0435\u043d\u0430 \u043d\u0438\u0436\u0435: <br \/>  <img decoding=\"async\" src=\"https:\/\/habrastorage.org\/files\/247\/f7a\/89d\/247f7a89d71d4f2ea3ce893b837b8589.png\"\/><br \/>  \u0412\u043a\u0440\u0430\u0442\u0446\u0435:  <\/p>\n<ul>\n<li>\u0422\u0435\u043a\u0441\u0442 \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u044f\u0435\u0442\u0441\u044f \u043a\u0430\u043a \u043c\u0430\u0442\u0440\u0438\u0446\u0430 \u0432\u0438\u0434\u0430 word_count x word_vector_size. \u0412\u0435\u043a\u0442\u043e\u0440\u044b \u043e\u0442\u0434\u0435\u043b\u044c\u043d\u044b\u0445 \u0441\u043b\u043e\u0432 \u2014 \u043e\u0442 word2vec, \u043e \u043a\u043e\u0442\u043e\u0440\u043e\u043c \u043c\u043e\u0436\u043d\u043e \u043f\u043e\u0447\u0438\u0442\u0430\u0442\u044c, \u043d\u0430\u043f\u0440\u0438\u043c\u0435\u0440, <a href=\"https:\/\/habrahabr.ru\/post\/253227\/\">\u0432 \u044d\u0442\u043e\u043c \u043f\u043e\u0441\u0442\u0435<\/a>. \u0422\u0430\u043a \u043a\u0430\u043a \u0437\u0430\u0440\u0430\u043d\u0435\u0435 \u043c\u043d\u0435 \u043d\u0435\u0438\u0437\u0432\u0435\u0441\u0442\u043d\u043e, \u043a\u0430\u043a\u043e\u0439 \u0442\u0435\u043a\u0441\u0442 \u043f\u043e\u0434\u0441\u0443\u043d\u0435\u0442 \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u044c \u2014 \u0431\u0435\u0440\u0443 \u0434\u043b\u0438\u043d\u0443 2 * N, \u0433\u0434\u0435 N \u2014 \u0447\u0438\u0441\u043b\u043e \u0432\u0435\u043a\u0442\u043e\u0440\u043e\u0432 \u0432 \u0434\u043b\u0438\u043d\u043d\u0435\u0439\u0448\u0435\u043c \u0442\u0435\u043a\u0441\u0442\u0435 \u043e\u0431\u0443\u0447\u0430\u044e\u0449\u0435\u0439 \u0432\u044b\u0431\u043e\u0440\u043a\u0438. \u0414\u0430, \u0442\u043a\u043d\u0443\u043b \u043f\u0430\u043b\u044c\u0446\u0435\u0432 \u0432 \u043d\u0435\u0431\u043e.<\/li>\n<li>\u041c\u0430\u0442\u0440\u0438\u0446\u0430 \u043e\u0431\u0440\u0430\u0431\u0430\u0442\u044b\u0432\u0430\u0435\u0442\u0441\u044f \u0441\u0432\u0451\u0440\u0442\u043e\u0447\u043d\u044b\u043c\u0438 \u0443\u0447\u0430\u0441\u0442\u043a\u0430\u043c\u0438 \u0441\u0435\u0442\u0438 (\u043d\u0430 \u0432\u044b\u0445\u043e\u0434\u0435 \u043f\u043e\u043b\u0443\u0447\u0430\u0435\u043c \u043f\u0440\u0435\u043e\u0431\u0440\u0430\u0437\u043e\u0432\u0430\u043d\u043d\u044b\u0435 \u043f\u0440\u0438\u0437\u043d\u0430\u043a\u0438 \u0441\u043b\u043e\u0432\u0430)<\/li>\n<li>\u0412\u044b\u0434\u0435\u043b\u0435\u043d\u043d\u044b\u0435 \u043f\u0440\u0438\u0437\u043d\u0430\u043a\u0438 \u043e\u0431\u0440\u0430\u0431\u0430\u0442\u044b\u0432\u0430\u044e\u0442\u0441\u044f \u043f\u043e\u043b\u043d\u043e\u0441\u0432\u044f\u0437\u043d\u044b\u043c \u0443\u0447\u0430\u0441\u0442\u043a\u043e\u043c \u0441\u0435\u0442\u0438<\/li>\n<\/ul>\n<p>  \u0421\u0442\u043e\u043f \u0441\u043b\u043e\u0432\u0430 \u043e\u0442\u0444\u0438\u043b\u044c\u0442\u0440\u043e\u0432\u044b\u0432\u0430\u044e \u043f\u0440\u0435\u0434\u0432\u0430\u0440\u0438\u0442\u0435\u043b\u044c\u043d\u043e (\u043d\u0430 reuter-\u043c dataset-\u0435 \u044d\u0442\u043e \u043d\u0435 \u0441\u043a\u0430\u0437\u044b\u0432\u0430\u043b\u043e\u0441\u044c, \u043d\u043e \u0432 \u043c\u0435\u043d\u044c\u0448\u0438\u0445 \u043f\u043e \u043e\u0431\u044a\u0435\u043c\u0443 \u043d\u0430\u0431\u043e\u0440\u0430\u0445 \u2014 \u043e\u043a\u0430\u0437\u0430\u043b\u043e \u0432\u043b\u0438\u044f\u043d\u0438\u0435). \u041e\u0431 \u044d\u0442\u043e\u043c \u043d\u0438\u0436\u0435.  <\/p>\n<p>  <a name=\"habracut\"><\/a><\/p>\n<h2>\u0423\u0441\u0442\u0430\u043d\u043e\u0432\u043a\u0430 \u043d\u0435\u043e\u0431\u0445\u043e\u0434\u0438\u043c\u043e\u0433\u043e \u041f\u041e (keras\/theano, cuda) \u0432 Windows<\/h2>\n<p>  <\/p>\n<p>\u0423\u0441\u0442\u0430\u043d\u043e\u0432\u043a\u0430 \u0434\u043b\u044f linux \u0431\u044b\u043b\u0430 \u043e\u0449\u0443\u0442\u0438\u043c\u043e \u043f\u0440\u043e\u0449\u0435. \u0422\u0440\u0435\u0431\u043e\u0432\u0430\u043b\u0438\u0441\u044c:  <\/p>\n<ul>\n<li>python3.5<\/li>\n<li>\u0437\u0430\u0433\u043e\u043b\u043e\u0432\u043e\u0447\u043d\u044b\u0435 \u0444\u0430\u0439\u043b\u044b python (python-dev \u0432 debian)<\/li>\n<li>gcc<\/li>\n<li>cuda<\/li>\n<li>python-\u0435 \u0431\u0438\u0431\u043b\u0438\u043e\u0442\u0435\u043a\u0438 \u2014 \u0442\u0435 \u0436\u0435, \u0447\u0442\u043e \u0438 \u0432 \u0441\u043f\u0438\u0441\u043a\u0435 \u043d\u0438\u0436\u0435<\/li>\n<\/ul>\n<p>  \u0412 \u043c\u043e\u0451\u043c \u0441\u043b\u0443\u0447\u0430\u0435 \u0441 win10 x64 \u043f\u0440\u0438\u043c\u0435\u0440\u043d\u0430\u044f \u043f\u043e\u0441\u043b\u0435\u0434\u043e\u0432\u0430\u0442\u0435\u043b\u044c\u043d\u043e\u0441\u0442\u044c \u0431\u044b\u043b\u0430 \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0435\u0439:  <\/p>\n<ul>\n<li>Anaconda \u0441 python3.5 \u2014 <a href=\"https:\/\/www.continuum.io\/downloads\">https:\/\/www.continuum.io\/downloads<\/a><\/li>\n<li>Cuda 8.0 \u2014 <a href=\"https:\/\/developer.nvidia.com\/cuda-downloads\">https:\/\/developer.nvidia.com\/cuda-downloads<\/a>. \u041c\u043e\u0436\u043d\u043e \u0437\u0430\u043f\u0443\u0441\u043a\u0430\u0442\u044c \u0438 \u043d\u0430 CPU (\u0442\u043e\u0433\u0434\u0430 \u0434\u043e\u0441\u0442\u0430\u0442\u043e\u0447\u043d\u043e gcc \u0438 \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u0435 4 \u0448\u0430\u0433\u0430 \u043d\u0435 \u043d\u0443\u0436\u043d\u044b), \u043d\u043e \u043d\u0430 \u043e\u0442\u043d\u043e\u0441\u0438\u0442\u0435\u043b\u044c\u043d\u043e \u043a\u0440\u0443\u043f\u043d\u044b\u0445 \u0434\u0430\u0442\u0430\u0441\u0435\u0442\u0430\u0445 \u043f\u0430\u0434\u0430\u043d\u0438\u0435 \u0432 \u0441\u043a\u043e\u0440\u043e\u0441\u0442\u0438 \u0434\u043e\u043b\u0436\u043d\u043e \u0431\u044b\u0442\u044c \u0441\u0443\u0449\u0435\u0441\u0442\u0432\u0435\u043d\u043d\u044b\u043c (\u043d\u0435 \u043f\u0440\u043e\u0432\u0435\u0440\u044f\u043b)<\/li>\n<li>\u041f\u0443\u0442\u044c \u043a nvcc \u0434\u043e\u0431\u0430\u0432\u043b\u0435\u043d \u0432 PATH (\u0432 \u043f\u0440\u043e\u0442\u0438\u0432\u043d\u043e\u043c \u0441\u043b\u0443\u0447\u0430\u0435 \u2014 theano \u0435\u0433\u043e \u043d\u0435 \u043e\u0431\u043d\u0430\u0440\u0443\u0436\u0438\u0442)<\/li>\n<li>Visual Studio 2015 \u0441 C++, \u0432\u043a\u043b\u044e\u0447\u0430\u044f windows 10 kit (\u043f\u043e\u0442\u0440\u0435\u0431\u0443\u0435\u0442\u0441\u044f corecrt.h)<\/li>\n<li>\u041f\u0443\u0442\u044c \u043a cl.exe \u0434\u043e\u0431\u0430\u0432\u043b\u0435\u043d \u0432 PATH<\/li>\n<li>\u041f\u0443\u0442\u044c \u043a corecrt.exe \u0432 INCLUDE (\u0432 \u043c\u043e\u0451\u043c \u0441\u043b\u0443\u0447\u0430\u0435 \u2014 C:\\Program Files (x86)\\Windows Kits\\10\\Include\\10.0.10240.0\\ucrt)<\/li>\n<li><code>conda install mingw libpython<\/code> \u2014 gcc \u0438 libpython \u043f\u043e\u0442\u0440\u0435\u0431\u0443\u0435\u0442\u0441\u044f \u043f\u0440\u0438 \u043a\u043e\u043c\u043f\u0438\u043b\u044f\u0446\u0438\u0438 \u0441\u0435\u0442\u043a\u0438<\/li>\n<li>\u043d\u0443 \u0438 <code>pip install keras theano python-levenshtein gensim nltk<\/code> (\u0432\u043e\u0437\u043c\u043e\u0436\u043d\u043e \u0437\u0430\u0432\u0435\u0434\u0435\u0442\u0441\u044f \u0438 \u0441 \u0437\u0430\u043c\u0435\u043d\u043e\u0439 keras-\u0433\u043e \u0431\u044d\u043a\u0435\u043d\u0434\u0430 \u0441 theano \u043d\u0430 tensorflow, \u043d\u043e \u043c\u043d\u043e\u0439 \u043d\u0435 \u043f\u0440\u043e\u0432\u0435\u0440\u044f\u043b\u043e\u0441\u044c)<\/li>\n<li>\u0432 .theanorc \u0443\u043a\u0430\u0437\u0430\u043d \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0438\u0439 \u0444\u043b\u0430\u0433 \u0434\u043b\u044f gcc:<br \/>   <code> [gcc]<br \/>   cxxflags = -D_hypot=hypot<br \/>   <\/code>   <\/li>\n<li>\u0417\u0430\u043f\u0443\u0441\u0442\u0438\u0442\u044c python \u0438 \u0432\u044b\u043f\u043e\u043b\u043d\u0438\u0442\u044c<br \/>   <code> import nltk<br \/>   nltk.download()<br \/>   <\/code>   <\/li>\n<\/ul>\n<h3>\u041e\u0431\u0440\u0430\u0431\u043e\u0442\u043a\u0430 \u0442\u0435\u043a\u0441\u0442\u0430<\/h3>\n<p>  <\/p>\n<p> \u041d\u0430 \u044d\u0442\u043e\u0439 \u0441\u0442\u0430\u0434\u0438\u0438 \u043f\u0440\u043e\u0438\u0441\u0445\u043e\u0434\u0438\u0442 \u0443\u0434\u0430\u043b\u0435\u043d\u0438\u0435 \u0441\u0442\u043e\u043f\u0441\u043b\u043e\u0432, \u043d\u0435 \u0432\u043e\u0448\u0435\u0434\u0448\u0438\u0445 \u0432 \u043a\u043e\u043c\u0431\u0438\u043d\u0430\u0446\u0438\u0438 \u0438\u0437 \u00ab\u0431\u0435\u043b\u043e\u0433\u043e \u0441\u043f\u0438\u0441\u043a\u0430\u00bb (\u043e \u043d\u0451\u043c \u0434\u0430\u043b\u0435\u0435) \u0438 \u0432\u0435\u043a\u0442\u043e\u0440\u0438\u0437\u0430\u0446\u0438\u044f \u043e\u0441\u0442\u0430\u0432\u0448\u0438\u0445\u0441\u044f. \u0412\u0445\u043e\u0434\u043d\u044b\u0435 \u0434\u0430\u043d\u043d\u044b\u0435 \u0434\u043b\u044f \u043f\u0440\u0438\u043c\u0435\u043d\u044f\u0435\u043c\u043e\u0433\u043e \u0430\u043b\u0433\u043e\u0440\u0438\u0442\u043c\u0430:   <\/p>\n<ul>\n<li>\u044f\u0437\u044b\u043a \u2014 \u0442\u0440\u0435\u0431\u0443\u0435\u0442\u0441\u044f nltk \u0434\u043b\u044f \u0442\u043e\u043a\u0435\u043d\u0438\u0437\u0430\u0446\u0438\u0438 \u0438 \u0432\u043e\u0437\u0432\u0440\u0430\u0449\u0435\u043d\u0438\u044f \u0441\u043f\u0438\u0441\u043a\u0430 \u0441\u0442\u043e\u043f\u0441\u043b\u043e\u0432<\/li>\n<li>\u00ab\u0431\u0435\u043b\u044b\u0439 \u0441\u043f\u0438\u0441\u043e\u043a\u00bb \u043a\u043e\u043c\u0431\u0438\u043d\u0430\u0446\u0438\u0439 \u0441\u043b\u043e\u0432, \u0432 \u043a\u043e\u0442\u043e\u0440\u044b\u0445 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u044e\u0442\u0441\u044f \u0441\u0442\u043e\u043f\u0441\u043b\u043e\u0432\u0430. \u041d\u0430\u043f\u0440\u0438\u043c\u0435\u0440 \u2014 \u00abon\u00bb \u043e\u0442\u043d\u0435\u0441\u0435\u043d\u043e \u043a \u0441\u0442\u043e\u043f\u0441\u043b\u043e\u0432\u0430\u043c, \u043d\u043e [\u00abturn\u00bb, \u00abon\u00bb] \u2014 \u0443\u0436\u0435 \u0434\u0440\u0443\u0433\u043e\u0435 \u0434\u0435\u043b\u043e<\/li>\n<li>\u0432\u0435\u043a\u0442\u043e\u0440\u044b word2vec<\/li>\n<\/ul>\n<p>   \u041d\u0443 \u0438 \u0430\u043b\u0433\u043e\u0440\u0438\u0442\u043c (\u0432\u0438\u0436\u0443 \u043a\u0430\u043a \u043c\u0438\u043d\u0438\u043c\u0443\u043c 2 \u0432\u043e\u0437\u043c\u043e\u0436\u043d\u044b\u0445 \u0443\u043b\u0443\u0447\u0448\u0435\u043d\u0438\u044f, \u043d\u043e \u043d\u0435 \u043e\u0441\u0438\u043b\u0438\u043b):   <\/p>\n<ul>\n<li>\u0420\u0430\u0437\u0431\u0438\u0432\u0430\u044e \u0432\u0445\u043e\u0434\u043d\u043e\u0439 \u0442\u0435\u043a\u0441\u0442 \u043d\u0430 \u0442\u043e\u043a\u0435\u043d\u044b ntlk.tokenize-\u043c (\u0443\u0441\u043b\u043e\u0432\u043d\u043e \u2014 \u00abHello, world!\u00bb \u043f\u0440\u0435\u043e\u0431\u0440\u0430\u0437\u0443\u0435\u0442\u0441\u044f \u0432 [\u00abhello\u00bb, &quot;,&quot;, \u00abworld\u00bb, &quot;!&quot;])<\/li>\n<li>\u041e\u0442\u0431\u0440\u0430\u0441\u044b\u0432\u0430\u044e \u0442\u043e\u043a\u0435\u043d\u044b, \u043a\u043e\u0442\u043e\u0440\u044b\u0445 \u043d\u0435\u0442 \u0432 word2vec-\u043c \u0441\u043b\u043e\u0432\u0430\u0440\u0435.<br \/>   \u041d\u0430 \u0441\u0430\u043c\u043e\u043c \u0434\u0435\u043b\u0435 \u2014 \u043a\u043e\u0442\u043e\u0440\u044b\u0445 \u0442\u0430\u043c \u043d\u0435\u0442 \u0438 \u0432\u044b\u0434\u0435\u043b\u0438\u0442\u044c \u0441\u0445\u043e\u0436\u0438\u0439 \u043f\u043e \u0440\u0430\u0441\u0441\u0442\u043e\u044f\u043d\u0438\u044e \u043d\u0435 \u0432\u044b\u0448\u043b\u043e. \u041f\u043e\u043a\u0430 \u0442\u043e\u043b\u044c\u043a\u043e \u0440\u0430\u0441\u0441\u0442\u043e\u044f\u043d\u0438\u0435 \u041b\u0435\u0432\u0435\u043d\u0448\u0442\u0435\u0439\u043d\u0430, \u0435\u0441\u0442\u044c \u0438\u0434\u0435\u044f \u043e\u0442\u0444\u0438\u043b\u044c\u0442\u0440\u043e\u0432\u044b\u0432\u0430\u0442\u044c \u0442\u043e\u043a\u0435\u043d\u044b \u0441 \u043d\u0430\u0438\u043c\u0435\u043d\u044c\u0448\u0438\u043c \u0440\u0430\u0441\u0441\u0442\u043e\u044f\u043d\u0438\u0435\u043c \u041b\u0435\u0432\u0435\u043d\u0448\u0442\u0435\u0439\u043d\u0430 \u043f\u043e \u0440\u0430\u0441\u0441\u0442\u043e\u044f\u043d\u0438\u044e \u043e\u0442 \u0438\u0445 \u0432\u0435\u043a\u0442\u043e\u0440\u043e\u0432 \u0434\u043e \u0432\u0435\u043a\u0442\u043e\u0440\u043e\u0432, \u0432\u0445\u043e\u0434\u044f\u0449\u0438\u0445 \u0432 \u043e\u0431\u0443\u0447\u0430\u044e\u0449\u0443\u044e \u0432\u044b\u0431\u043e\u0440\u043a\u0443<\/li>\n<li>\u0412\u044b\u0431\u0440\u0430\u0442\u044c \u0442\u043e\u043a\u0435\u043d\u044b:<br \/> \n<ul>\n<li>\u043a\u043e\u0442\u043e\u0440\u044b\u0445 \u043d\u0435\u0442 \u0432 \u0441\u043f\u0438\u0441\u043a\u0435 \u0441\u0442\u043e\u043f\u0441\u043b\u043e\u0432 (\u0441\u043d\u0438\u0437\u0438\u043b\u043e \u043e\u0448\u0438\u0431\u043a\u0443 \u043d\u0430 \u043f\u043e\u0433\u043e\u0434\u043d\u043e\u043c \u0434\u0430\u0442\u0430\u0441\u0435\u0442\u0435, \u043d\u043e \u0431\u0435\u0437 \u0441\u043b\u0435\u0434\u0443\u044e\u0449\u0435\u0433\u043e \u0448\u0430\u0433\u0430 \u2014 \u043e\u0447\u0435\u043d\u044c \u0438\u0441\u043f\u043e\u0440\u0442\u0438\u043b\u043e \u0440\u0435\u0437\u0443\u043b\u044c\u0442\u0430\u0442 \u043d\u0430 \u00abcar_intents\u00bb-\u043c).<\/li>\n<li>\u0435\u0441\u043b\u0438 \u0442\u043e\u043a\u0435\u043d \u0432 \u0441\u043f\u0438\u0441\u043a\u0435 \u0441\u0442\u043e\u043f\u0441\u043b\u043e\u0432 \u2014 \u043f\u0440\u043e\u0432\u0435\u0440\u0438\u0442\u044c \u0432\u0445\u043e\u0436\u0434\u0435\u043d\u0438\u0435 \u0432 \u0442\u0435\u043a\u0441\u0442 \u043f\u043e\u0441\u043b\u0435\u0434\u043e\u0432\u0430\u0442\u0435\u043b\u044c\u043d\u043e\u0441\u0442\u0435\u0439 \u0438\u0437 \u0431\u0435\u043b\u043e\u0433\u043e \u0441\u043f\u0438\u0441\u043a\u0430, \u0432 \u043a\u043e\u0442\u043e\u0440\u044b\u0445 \u043e\u043d \u0435\u0441\u0442\u044c (\u0443\u0441\u043b\u043e\u0432\u043d\u043e \u2014 \u043f\u043e \u043d\u0430\u0445\u043e\u0436\u0434\u0435\u043d\u0438\u0438 \u00abon\u00bb \u043f\u0440\u043e\u0432\u0435\u0440\u0438\u0442\u044c \u043d\u0430\u043b\u0438\u0447\u0438\u0435 \u043f\u043e\u0441\u043b\u0435\u0434\u043e\u0432\u0430\u0442\u0435\u043b\u044c\u043d\u043e\u0441\u0442\u0435\u0439 \u0438\u0437 \u0441\u043f\u0438\u0441\u043a\u0430 [[\u00abturn\u00bb, \u00abon\u00bb]]). \u0415\u0441\u043b\u0438 \u0442\u0430\u043a\u0430\u044f \u043d\u0430\u0439\u0434\u0451\u0442\u0441\u044f \u2014 \u0432\u0441\u0451 \u0436\u0435 \u0434\u043e\u0431\u0430\u0432\u0438\u0442\u044c \u0435\u0433\u043e. \u0415\u0441\u0442\u044c \u0447\u0442\u043e \u0443\u043b\u0443\u0447\u0448\u0438\u0442\u044c \u2014 \u0441\u0435\u0439\u0447\u0430\u0441 \u044f \u043f\u0440\u043e\u0432\u0435\u0440\u044f\u044e (\u0432 \u043d\u0430\u0448\u0435\u043c \u043f\u0440\u0438\u043c\u0435\u0440\u0435) \u043d\u0430\u043b\u0438\u0447\u0438\u0435 \u00abturn\u00bb, \u043d\u043e \u043e\u043d\u043e \u0436\u0435 \u043c\u043e\u0436\u0435\u0442 \u0438 \u043d\u0435 \u043e\u0442\u043d\u043e\u0441\u0438\u0442\u044c\u0441\u044f \u043a \u0434\u0430\u043d\u043d\u043e\u043c\u0443 \u00abon\u00bb.<\/li>\n<\/ul>\n<p>   <\/li>\n<li>\u0417\u0430\u043c\u0435\u043d\u0438\u0442\u044c \u0432\u044b\u0431\u0440\u0430\u043d\u043d\u044b\u0435 \u0442\u043e\u043a\u0435\u043d\u044b \u0438\u0445 \u0432\u0435\u043a\u0442\u043e\u0440\u0430\u043c\u0438.<\/li>\n<\/ul>\n<p>  <\/p>\n<h2>\u041a\u043e\u0434\u0430 \u043d\u0430\u043c, \u043a\u043e\u0434\u0430<\/h2>\n<p>  <\/p>\n<p><div class=\"spoiler\"><b class=\"spoiler_title\">\u0421\u043e\u0431\u0441\u0442\u0432\u0435\u043d\u043d\u043e, \u043a\u043e\u0434 \u043a\u043e\u0442\u043e\u0440\u044b\u043c \u044f \u0438 \u043e\u0446\u0435\u043d\u0438\u0432\u0430\u043b \u0432\u043b\u0438\u044f\u043d\u0438\u0435 \u0438\u0437\u043c\u0435\u043d\u0435\u043d\u0438\u0439<\/b><\/p>\n<div class=\"spoiler_text\">\n<pre><code class=\"python\">import itertools import json import numpy from gensim.models import Word2Vec from pynlc.test_data import reuters_classes, word2vec, car_classes, weather_classes from pynlc.text_classifier import TextClassifier from pynlc.text_processor import TextProcessor from sklearn.metrics import mean_squared_error  def classification_demo(data_path, train_before, test_before, train_epochs, test_labels_path, instantiated_test_labels_path, trained_path):     with open(data_path, 'r', encoding='utf-8') as data_source:         data = json.load(data_source)     texts = [item[&quot;text&quot;] for item in data]     class_names = [item[&quot;classes&quot;] for item in data]     train_texts = texts[:train_before]     train_classes = class_names[:train_before]     test_texts = texts[train_before:test_before]     test_classes = class_names[train_before:test_before]     text_processor = TextProcessor(&quot;english&quot;, [[&quot;turn&quot;, &quot;on&quot;], [&quot;turn&quot;, &quot;off&quot;]], Word2Vec.load_word2vec_format(word2vec))     classifier = TextClassifier(text_processor)     classifier.train(train_texts, train_classes, train_epochs, True)     prediction = classifier.predict(test_texts)     with open(test_labels_path, &quot;w&quot;, encoding=&quot;utf-8&quot;) as test_labels_output:         test_labels_output_lst = []         for i in range(0, len(prediction)):             test_labels_output_lst.append({                 &quot;real&quot;: test_classes[i],                 &quot;classified&quot;: prediction[i]             })         json.dump(test_labels_output_lst, test_labels_output)     instantiated_classifier = TextClassifier(text_processor, **classifier.config)     instantiated_prediction = instantiated_classifier.predict(test_texts)     with open(instantiated_test_labels_path, &quot;w&quot;, encoding=&quot;utf-8&quot;) as instantiated_test_labels_output:         instantiated_test_labels_output_lst = []         for i in range(0, len(instantiated_prediction)):             instantiated_test_labels_output_lst.append({                 &quot;real&quot;: test_classes[i],                 &quot;classified&quot;: instantiated_prediction[i]             })         json.dump(instantiated_test_labels_output_lst, instantiated_test_labels_output)     with open(trained_path, &quot;w&quot;, encoding=&quot;utf-8&quot;) as trained_output:         json.dump(classifier.config, trained_output, ensure_ascii=True)  def classification_error(files):     for name in files:         with open(name, &quot;r&quot;, encoding=&quot;utf-8&quot;) as src:             data = json.load(src)         classes = []         real = []         for row in data:             classes.append(row[&quot;real&quot;])             classified = row[&quot;classified&quot;]             row_classes = list(classified.keys())             row_classes.sort()             real.append([classified[class_name] for class_name in row_classes])         labels = []         class_names = list(set(itertools.chain(*classes)))         class_names.sort()         for item_classes in classes:             labels.append([int(class_name in item_classes) for class_name in class_names])         real_np = numpy.array(real)         mse = mean_squared_error(numpy.array(labels), real_np)         print(name, mse)  if __name__ == '__main__':     print(&quot;Reuters:\\n&quot;)     classification_demo(reuters_classes, 10000, 15000, 10,                         &quot;reuters_test_labels.json&quot;, &quot;reuters_car_test_labels.json&quot;,                         &quot;reuters_trained.json&quot;)     classification_error([&quot;reuters_test_labels.json&quot;, &quot;reuters_car_test_labels.json&quot;])     print(&quot;Car intents:\\n&quot;)     classification_demo(car_classes, 400, 500, 20,                         &quot;car_test_labels.json&quot;, &quot;instantiated_car_test_labels.json&quot;,                         &quot;car_trained.json&quot;)     classification_error([&quot;cars_test_labels.json&quot;, &quot;instantiated_cars_test_labels.json&quot;])     print(&quot;Weather:\\n&quot;)     classification_demo(weather_classes, 40, 50, 30,                         &quot;weather_test_labels.json&quot;, &quot;instantiated_weather_test_labels.json&quot;,                         &quot;weather_trained.json&quot;)     classification_error([&quot;weather_test_labels.json&quot;, &quot;instantiated_weather_test_labels.json&quot;]) <\/code><\/pre>\n<p>  <\/div>\n<\/div>\n<p>  \u0417\u0434\u0435\u0441\u044c \u0432\u044b \u0432\u0438\u0434\u0438\u0442\u0435    <\/p>\n<ul>\n<li>\u041f\u043e\u0434\u0433\u043e\u0442\u043e\u0432\u043a\u0443 \u0434\u0430\u043d\u043d\u044b\u0445<br \/> \n<pre><code class=\"python\">with open(data_path, 'r', encoding='utf-8') as data_source:    data = json.load(data_source) texts = [item[&quot;text&quot;] for item in data] class_names = [item[&quot;classes&quot;] for item in data] train_texts = texts[:train_before] train_classes = class_names[:train_before] test_texts = texts[train_before:test_before] test_classes = class_names[train_before:test_before] <\/code><\/pre>\n<p>   <\/li>\n<li>\u0421\u043e\u0437\u0434\u0430\u043d\u0438\u0435 \u043d\u043e\u0432\u043e\u0433\u043e \u043a\u043b\u0430\u0441\u0441\u0438\u0444\u0438\u043a\u0430\u0442\u043e\u0440\u0430<br \/> \n<pre><code class=\"python\">text_processor = TextProcessor(&quot;english&quot;, [[&quot;turn&quot;, &quot;on&quot;], [&quot;turn&quot;, &quot;off&quot;]], Word2Vec.load_word2vec_format(word2vec)) classifier = TextClassifier(text_processor) <\/code><\/pre>\n<p>   <\/li>\n<li>\u0415\u0433\u043e \u043e\u0431\u0443\u0447\u0435\u043d\u0438\u0435<br \/> \n<pre><code class=\"python\">classifier.train(train_texts, train_classes, train_epochs, True) <\/code><\/pre>\n<p>   <\/li>\n<li> \u041f\u0440\u0435\u0434\u0441\u043a\u0430\u0437\u0430\u043d\u0438\u0435 \u043a\u043b\u0430\u0441\u0441\u043e\u0432 \u0434\u043b\u044f \u0442\u0435\u0441\u0442\u043e\u0432\u043e\u0439 \u0432\u044b\u0431\u043e\u0440\u043a\u0438 \u0438 \u0441\u043e\u0445\u0440\u0430\u043d\u0435\u043d\u0438\u0435 \u043f\u0430\u0440 \u00ab\u043d\u0430\u0441\u0442\u043e\u044f\u0449\u0438\u0435 \u043a\u043b\u0430\u0441\u0441\u044b\u00bb-\u00ab\u043f\u0440\u0435\u0434\u0441\u043a\u0430\u0437\u0430\u043d\u043d\u044b\u0435 \u0432\u0435\u0440\u043e\u044f\u0442\u043d\u043e\u0441\u0442\u0438 \u043a\u043b\u0430\u0441\u0441\u043e\u0432\u00bb<br \/> \n<pre><code class=\"python\">prediction = classifier.predict(test_texts) with open(test_labels_path, &quot;w&quot;, encoding=&quot;utf-8&quot;) as test_labels_output:         test_labels_output_lst = []         for i in range(0, len(prediction)):             test_labels_output_lst.append({                 &quot;real&quot;: test_classes[i],                 &quot;classified&quot;: prediction[i]             })         json.dump(test_labels_output_lst, test_labels_output) <\/code><\/pre>\n<p>   <\/li>\n<li>\u0421\u043e\u0437\u0434\u0430\u043d\u0438\u0435 \u043d\u043e\u0432\u043e\u0433\u043e \u044d\u043a\u0437\u0435\u043c\u043f\u043b\u044f\u0440\u0430 \u043a\u043b\u0430\u0441\u0441\u0438\u0444\u0438\u043a\u0430\u0442\u043e\u0440\u0430 \u043f\u043e \u043a\u043e\u043d\u0444\u0438\u0433\u0443\u0440\u0430\u0446\u0438\u0438 (dict, \u043c\u043e\u0436\u0435\u0442 \u0431\u044b\u0442\u044c \u0441\u0435\u0440\u0438\u0430\u043b\u0438\u0437\u043e\u0432\u0430\u043d\u0430 \u0432\/\u0434\u0435\u0441\u0435\u0440\u0438\u0430\u043b\u0438\u0437\u043e\u0432\u0430\u043d\u0430 \u0438\u0437, \u043d\u0430\u043f\u0440\u0438\u043c\u0435\u0440 json)<br \/> \n<pre><code class=\"python\">instantiated_classifier = TextClassifier(text_processor, **classifier.config) \t\t\t<\/code><\/pre>\n<p>   <\/li>\n<\/ul>\n<p>  \u0412\u044b\u0445\u043b\u043e\u043f \u043f\u0440\u0438\u043c\u0435\u0440\u043d\u043e \u0442\u0430\u043a\u043e\u0432:   <\/p>\n<pre> C:\\Users\\user\\pynlc-env\\lib\\site-packages\\gensim\\utils.py:840: UserWarning: detected Windows; aliasing chunkize to chunkize_serial   warnings.warn(&quot;detected Windows; aliasing chunkize to chunkize_serial&quot;) C:\\Users\\user\\pynlc-env\\lib\\site-packages\\gensim\\utils.py:1015: UserWarning: Pattern library is not installed, lemmatization won't be available.   warnings.warn(&quot;Pattern library is not installed, lemmatization won't be available.&quot;) Using Theano backend. Using gpu device 0: GeForce GT 730 (CNMeM is disabled, cuDNN not available) Reuters: Train on 3000 samples, validate on 7000 samples Epoch 1\/10 20\/3000 [..............................] - ETA: 307s - loss: 0.6968 - acc: 0.5376 .... 3000\/3000 [==============================] - 640s - loss: 0.0018 - acc: 0.9996 - val_loss: 0.0019 - val_acc: 0.9996 Epoch 8\/10 20\/3000 [..............................] - ETA: 323s - loss: 0.0012 - acc: 0.9994 ... 3000\/3000 [==============================] - 635s - loss: 0.0012 - acc: 0.9997 - val_loss: 9.2200e-04 - val_acc: 0.9998 Epoch 9\/10 20\/3000 [..............................] - ETA: 315s - loss: 3.4387e-05 - acc: 1.0000 ... 3000\/3000 [==============================] - 879s - loss: 0.0012 - acc: 0.9997 - val_loss: 0.0016 - val_acc: 0.9995 Epoch 10\/10 20\/3000 [..............................] - ETA: 327s - loss: 8.0144e-04 - acc: 0.9997 ... 3000\/3000 [==============================] - 655s - loss: 0.0012 - acc: 0.9997 - val_loss: 7.4761e-04 - val_acc: 0.9998 reuters_test_labels.json 0.000151774189194 reuters_car_test_labels.json 0.000151774189194  Car intents: Train on 280 samples, validate on 120 samples Epoch 1\/20 20\/280 [=&gt;............................] - ETA: 0s - loss: 0.6729 - acc: 0.5250 ... 280\/280 [==============================] - 0s - loss: 0.2914 - acc: 0.8980 - val_loss: 0.2282 - val_acc: 0.9375 ... Epoch 19\/20 20\/280 [=&gt;............................] - ETA: 0s - loss: 0.0552 - acc: 0.9857 ... 280\/280 [==============================] - 0s - loss: 0.0464 - acc: 0.9842 - val_loss: 0.1647 - val_acc: 0.9494 Epoch 20\/20 20\/280 [=&gt;............................] - ETA: 0s - loss: 0.0636 - acc: 0.9714 ... 280\/280 [==============================] - 0s - loss: 0.0447 - acc: 0.9849 - val_loss: 0.1583 - val_acc: 0.9530 cars_test_labels.json 0.0520754688092 instantiated_cars_test_labels.json 0.0520754688092  Weather: Train on 28 samples, validate on 12 samples Epoch 1\/30 20\/28 [====================&gt;.........] - ETA: 0s - loss: 0.6457 - acc: 0.6000 ... Epoch 29\/30 20\/28 [====================&gt;.........] - ETA: 0s - loss: 0.0021 - acc: 1.0000 ... 28\/28 [==============================] - 0s - loss: 0.0019 - acc: 1.0000 - val_loss: 0.1487 - val_acc: 0.9167 Epoch 30\/30 ... 28\/28 [==============================] - 0s - loss: 0.0018 - acc: 1.0000 - val_loss: 0.1517 - val_acc: 0.9167 weather_test_labels.json 0.0136964029149 instantiated_weather_test_labels.json 0.0136964029149 <\/pre>\n<p>  \u041f\u043e \u0445\u043e\u0434\u0443 \u044d\u043a\u0441\u043f\u0435\u0440\u0438\u043c\u0435\u043d\u0442\u043e\u0432 \u0441 \u0441\u0442\u043e\u043f\u0441\u043b\u043e\u0432\u0430\u043c\u0438:  <\/p>\n<ul>\n<li> \u043e\u0448\u0438\u0431\u043a\u0430 \u0432 reuter-\u043c \u043d\u0430\u0431\u043e\u0440\u0435 \u043e\u0441\u0442\u0430\u0432\u0430\u043b\u0430\u0441\u044c \u0441\u0440\u0430\u0432\u043d\u0438\u043c\u0430 \u0432\u043d\u0435 \u0437\u0430\u0432\u0438\u0441\u0438\u043c\u043e\u0441\u0442\u0438 \u043e\u0442 \u0443\u0434\u0430\u043b\u0435\u043d\u0438\u0435\/\u0441\u043e\u0445\u0440\u0430\u043d\u0435\u043d\u0438\u044f \u0441\u0442\u043e\u043f\u0441\u043b\u043e\u0432<\/li>\n<li> \u043e\u0448\u0438\u0431\u043a\u0430 \u0432 weather-\u043c \u2014 \u0443\u043f\u0430\u043b\u0430 \u0441 8% \u043f\u0440\u0438 \u0443\u0434\u0430\u043b\u0435\u043d\u0438\u0438 \u0441\u0442\u043e\u043f\u0441\u043b\u043e\u0432. \u0423\u0441\u043b\u043e\u0436\u043d\u0435\u043d\u0438\u0435 \u0430\u043b\u0433\u043e\u0440\u0438\u0442\u043c\u0430 \u043d\u0435 \u043f\u043e\u0432\u043b\u0438\u044f\u043b\u043e (\u0442.\u043a. \u043a\u043e\u043c\u0431\u0438\u043d\u0430\u0446\u0438\u0439, \u043f\u0440\u0438 \u043a\u043e\u0442\u043e\u0440\u044b\u0445 \u0441\u0442\u043e\u043f\u0441\u043b\u043e\u0432\u043e \u0442\u0430\u043a\u0438 \u043d\u0443\u0436\u043d\u043e \u0441\u043e\u0445\u0440\u0430\u043d\u0438\u0442\u044c \u0442\u0443\u0442 \u043d\u0435\u0442). <\/li>\n<li> \u043e\u0448\u0438\u0431\u043a\u0430 \u0432 car_intent-\u043c \u2014 \u0432\u043e\u0437\u0440\u043e\u0441\u043b\u0430 \u043f\u0440\u0438\u043c\u0435\u0440\u043d\u043e \u0434\u043e 15% \u043f\u0440\u0438 \u0443\u0434\u0430\u043b\u0435\u043d\u0438\u0438 \u0441\u0442\u043e\u043f\u0441\u043b\u043e\u0432 (\u043d\u0430\u043f\u0440\u0438\u043c\u0435\u0440, \u0443\u0441\u043b\u043e\u0432\u043d\u043e\u0435 \u00abturn on\u00bb \u0443\u0440\u0435\u0437\u0430\u043b\u043e\u0441\u044c \u0434\u043e \u00abturn\u00bb). \u041f\u0440\u0438 \u0434\u043e\u0431\u0430\u0432\u043b\u0435\u043d\u0438\u0438 \u043e\u0431\u0440\u0430\u0431\u043e\u0442\u043a\u0438 \u00ab\u0431\u0435\u043b\u043e\u0433\u043e \u0441\u043f\u0438\u0441\u043a\u0430\u00bb \u2014 \u0432\u0435\u0440\u043d\u0443\u043b\u0430\u0441\u044c \u043d\u0430 \u043f\u0440\u0435\u0436\u043d\u0438\u0439 \u0443\u0440\u043e\u0432\u0435\u043d\u044c<\/li>\n<\/ul>\n<p>  <\/p>\n<h3>\u041f\u0440\u0438\u043c\u0435\u0440 \u0441 \u0437\u0430\u043f\u0443\u0441\u043a\u043e\u043c \u0437\u0430\u0440\u0430\u043d\u0435\u0435 \u043e\u0431\u0443\u0447\u0435\u043d\u043d\u043e\u0433\u043e \u043a\u043b\u0430\u0441\u0441\u0438\u0444\u0438\u043a\u0430\u0442\u043e\u0440\u0430<\/h3>\n<p>  <\/p>\n<p> \u0421\u043e\u0431\u0441\u0442\u0432\u0435\u043d\u043d\u043e, \u0441\u0432\u043e\u0439\u0441\u0442\u0432\u043e TextClassifier.config \u2014 \u0441\u043b\u043e\u0432\u0430\u0440\u044c, \u043a\u043e\u0442\u043e\u0440\u044b\u0439 \u043c\u043e\u0436\u043d\u043e \u043e\u0442\u0440\u0435\u043d\u0434\u0435\u0440\u0438\u0442\u044c, \u043d\u0430\u043f\u0440\u0438\u043c\u0435\u0440, \u0432 json \u0438 \u043f\u043e\u0441\u043b\u0435 \u0432\u043e\u0441\u0441\u0442\u0430\u043d\u043e\u0432\u043b\u0435\u043d\u0438\u044f \u0438\u0437 json-\u0430 \u2014 \u043f\u0435\u0440\u0435\u0434\u0430\u0442\u044c \u0435\u0433\u043e \u044d\u043b\u0435\u043c\u0435\u043d\u0442\u044b \u0432 \u043a\u043e\u043d\u0441\u0442\u0440\u0443\u043a\u0442\u043e\u0440 TextClassifier-\u0430. \u041d\u0430\u043f\u0440\u0438\u043c\u0435\u0440:   <\/p>\n<pre><code class=\"python\">import json from gensim.models import Word2Vec from pynlc.test_data import word2vec from pynlc import TextProcessor, TextClassifier  if __name__ == '__main__':     text_processor = TextProcessor(&quot;english&quot;, [[&quot;turn&quot;, &quot;on&quot;], [&quot;turn&quot;, &quot;off&quot;]],                                    Word2Vec.load_word2vec_format(word2vec))     with open(&quot;weather_trained.json&quot;, &quot;r&quot;, encoding=&quot;utf-8&quot;) as classifier_data_source:         classifier_data = json.load(classifier_data_source)     classifier = TextClassifier(text_processor, **classifier_data)     texts = [         &quot;Will it be windy or rainy at evening?&quot;,         &quot;How cold it'll be today?&quot;     ]     predictions = classifier.predict(texts)     for i in range(0, len(texts)):         print(texts[i])         print(predictions[i]) <\/code><\/pre>\n<p>   \u0418 \u0435\u0433\u043e \u0432\u044b\u0445\u043b\u043e\u043f:   <\/p>\n<pre> C:\\Users\\user\\pynlc-env\\lib\\site-packages\\gensim\\utils.py:840: UserWarning: detected Windows; aliasing chunkize to chunkize_serial  warnings.warn(&quot;detected Windows; aliasing chunkize to chunkize_serial&quot;) C:\\Users\\user\\pynlc-env\\lib\\site-packages\\gensim\\utils.py:1015: UserWarning: Pattern library is not installed, lemmatization won't be available.   warnings.warn(&quot;Pattern library is not installed, lemmatization won't be available.&quot;) Using Theano backend. Will it be windy or rainy at evening? {'temperature': 0.039208538830280304, 'conditions': 0.9617446660995483} How cold it'll be today? {'temperature': 0.9986168146133423, 'conditions': 0.0016815820708870888}   <\/pre>\n<p>  <\/p>\n<p>\u0418 \u0434\u0430, \u043a\u043e\u043d\u0444\u0438\u0433 \u0441\u0435\u0442\u0438 \u043e\u0431\u0443\u0447\u0435\u043d\u043d\u043e\u0439 \u043d\u0430 \u0434\u0430\u0442\u0430\u0441\u0435\u0442\u0435 \u043e\u0442 reuters \u2014 \u0442\u0443\u0442 <a href=\"https:\/\/drive.google.com\/file\/d\/0B7cY3wBgM-aBWGh3NmFjSGVHVzA\/view?usp=sharing\">https:\/\/drive.google.com\/file\/d\/0B7cY3wBgM-aBWGh3NmFjSGVHVzA\/view?usp=sharing<\/a>. \u0413\u0438\u0433\u0430\u0431\u0430\u0439\u0442 \u0441\u0435\u0442\u043a\u0438 \u0434\u043b\u044f 19\u041c\u0431 \u0434\u0430\u0442\u0430\u0441\u0435\u0442\u0430, \u0434\u0430 \ud83d\ude42 <\/p>\n<p> \u0441\u0441\u044b\u043b\u043a\u0430 \u043d\u0430 \u043e\u0440\u0438\u0433\u0438\u043d\u0430\u043b \u0441\u0442\u0430\u0442\u044c\u0438 <a href=\"https:\/\/habrahabr.ru\/post\/315118\/\"> https:\/\/habrahabr.ru\/post\/315118\/<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u0420\u0435\u0447\u044c, \u043a\u0430\u043a \u043d\u0438 \u0441\u0442\u0440\u0430\u043d\u043d\u043e, \u043f\u043e\u0439\u0434\u0451\u0442 \u043e \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u0443\u044e\u0449\u0435\u043c \u0441\u0432\u0451\u0440\u0442\u043e\u0447\u043d\u0443\u044e \u0441\u0435\u0442\u044c \u043a\u043b\u0430\u0441\u0441\u0438\u0444\u0438\u043a\u0430\u0442\u043e\u0440\u0435 \u0442\u0435\u043a\u0441\u0442\u043e\u0432 (\u0432\u0435\u043a\u0442\u043e\u0440\u0438\u0437\u0430\u0446\u0438\u044f \u043e\u0442\u0434\u0435\u043b\u044c\u043d\u044b\u0445 \u0441\u043b\u043e\u0432 \u2014 \u044d\u0442\u043e \u0443\u0436\u0435 \u0434\u0440\u0443\u0433\u043e\u0439 \u0432\u043e\u043f\u0440\u043e\u0441). <a href=\"https:\/\/bitbucket.org\/alex43210\/pynlc\">\u041a\u043e\u0434, \u0442\u0435\u0441\u0442\u043e\u0432\u044b\u0435 \u0434\u0430\u043d\u043d\u044b\u0435 \u0438 \u043f\u0440\u0438\u043c\u0435\u0440\u044b \u0438\u0445 \u043f\u0440\u0438\u043c\u0435\u043d\u0435\u043d\u0438\u044f<\/a> \u2014 \u043d\u0430 bitbucket (\u0443\u043f\u0435\u0440\u0441\u044f \u0432 \u043e\u0433\u0440\u0430\u043d\u0438\u0447\u0435\u043d\u0438\u044f \u0440\u0430\u0437\u043c\u0435\u0440\u0430 \u043e\u0442 github \u0438 \u043f\u0440\u0435\u0434\u043b\u043e\u0436\u0435\u043d\u0438\u0435 \u043f\u0440\u0438\u043c\u0435\u043d\u0438\u0442\u044c Git Large File Storage (LFS), \u043f\u043e\u043a\u0430 \u043d\u0435 \u043e\u0441\u0438\u043b\u0438\u043b \u043f\u0440\u0435\u0434\u043b\u0430\u0433\u0430\u0435\u043c\u043e\u0435 \u0440\u0435\u0448\u0435\u043d\u0438\u0435).<\/p>\n<h2>\u041d\u0430\u0431\u043e\u0440\u044b \u0434\u0430\u043d\u043d\u044b\u0445<\/h2>\n<p>  <\/p>\n<p>\u0418\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u044b \u043a\u043e\u043d\u0432\u0435\u0440\u0442\u0438\u0440\u043e\u0432\u0430\u043d\u043d\u044b\u0435 \u043d\u0430\u0431\u043e\u0440\u044b: <a href=\"http:\/\/www.daviddlewis.com\/resources\/testcollections\/reuters21578\/\">http:\/\/www.daviddlewis.com\/resources\/testcollections\/reuters21578\/<\/a> (22000 \u0437\u0430\u043f\u0438\u0441\u0435\u0439), <a href=\"https:\/\/github.com\/watson-developer-cloud\/car-dashboard\/blob\/master\/training\/car_workspace.json\">https:\/\/github.com\/watson-developer-cloud\/car-dashboard\/blob\/master\/training\/car_workspace.json<\/a> (530 \u0437\u0430\u043f\u0438\u0441\u0435\u0439), <a href=\"https:\/\/github.com\/watson-developer-cloud\/natural-language-classifier-nodejs\/blob\/master\/training\/weather_data_train.csv\">https:\/\/github.com\/watson-developer-cloud\/natural-language-classifier-nodejs\/blob\/master\/training\/weather_data_train.csv<\/a> (50 \u0437\u0430\u043f\u0438\u0441\u0435\u0439). <i>\u041a\u0441\u0442\u0430\u0442\u0438, \u043d\u0435 \u043e\u0442\u043a\u0430\u0437\u0430\u043b\u0441\u044f \u0431\u044b \u043e\u0442 \u043f\u043e\u0434\u043a\u0438\u043d\u0443\u0442\u043e\u0433\u043e \u0432 \u043a\u043e\u043c\u043c\u0435\u043d\u0442\u044b\/\u041b\u0421 (\u043d\u043e \u043b\u0443\u0447\u0448\u0435 \u0442\u0430\u043a\u0438 \u0432 \u043a\u043e\u043c\u043c\u0435\u043d\u0442\u044b) \u043d\u0430\u0431\u043e\u0440\u0430 \u0442\u0435\u043a\u0441\u0442\u043e\u0432 \u043d\u0430 \u0440\u0443\u0441\u0441\u043a\u043e\u043c.<\/i><\/p>\n<h2>\u0423\u0441\u0442\u0440\u043e\u0439\u0441\u0442\u0432\u043e \u0441\u0435\u0442\u0438<\/h2>\n<p>  <\/p>\n<p>\u0417\u0430 \u043e\u0441\u043d\u043e\u0432\u0443 \u0432\u0437\u044f\u0442\u0430 \u043e\u0434\u043d\u0430 \u0440\u0435\u0430\u043b\u0438\u0437\u0430\u0446\u0438\u044f \u043e\u043f\u0438\u0441\u0430\u043d\u043d\u043e\u0439 \u0442\u0443\u0442 \u0441\u0435\u0442\u0438: <a href=\"https:\/\/arxiv.org\/abs\/1408.5882\">https:\/\/arxiv.org\/abs\/1408.5882<\/a>. \u041a\u043e\u0434 \u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u043d\u043e\u0439 \u0440\u0435\u0430\u043b\u0438\u0437\u0430\u0446\u0438\u0438 \u043d\u0430 <a href=\"https:\/\/github.com\/alexander-rakhlin\/CNN-for-Sentence-Classification-in-Keras\">https:\/\/github.com\/alexander-rakhlin\/CNN-for-Sentence-Classification-in-Keras<\/a>.<br \/>  \u0412 \u043c\u043e\u0451\u043c \u0441\u043b\u0443\u0447\u0430\u0435 \u2014 \u043d\u0430 \u0432\u0445\u043e\u0434\u0435 \u0441\u0435\u0442\u0438 \u043d\u0430\u0445\u043e\u0434\u044f\u0442\u0441\u044f \u0432\u0435\u043a\u0442\u043e\u0440\u044b \u0441\u043b\u043e\u0432 (\u0438\u0441\u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u043d\u0430 gensim-\u044f \u0440\u0435\u0430\u043b\u0438\u0437\u0430\u0446\u0438\u044f word2vec). \u0421\u0442\u0440\u0443\u043a\u0442\u0443\u0440\u0430 \u0441\u0435\u0442\u0438 \u0438\u0437\u043e\u0431\u0440\u0430\u0436\u0435\u043d\u0430 \u043d\u0438\u0436\u0435: <br \/>  <img decoding=\"async\" src=\"https:\/\/habrastorage.org\/files\/247\/f7a\/89d\/247f7a89d71d4f2ea3ce893b837b8589.png\"\/><br \/>  \u0412\u043a\u0440\u0430\u0442\u0446\u0435:  <\/p>\n<ul>\n<li>\u0422\u0435\u043a\u0441\u0442 \u043f\u0440\u0435\u0434\u0441\u0442\u0430\u0432\u043b\u044f\u0435\u0442\u0441\u044f \u043a\u0430\u043a \u043c\u0430\u0442\u0440\u0438\u0446\u0430 \u0432\u0438\u0434\u0430 word_count x word_vector_size. \u0412\u0435\u043a\u0442\u043e\u0440\u044b \u043e\u0442\u0434\u0435\u043b\u044c\u043d\u044b\u0445 \u0441\u043b\u043e\u0432 \u2014 \u043e\u0442 word2vec, \u043e \u043a\u043e\u0442\u043e\u0440\u043e\u043c \u043c\u043e\u0436\u043d\u043e \u043f\u043e\u0447\u0438\u0442\u0430\u0442\u044c, \u043d\u0430\u043f\u0440\u0438\u043c\u0435\u0440, <a href=\"https:\/\/habrahabr.ru\/post\/253227\/\">\u0432 \u044d\u0442\u043e\u043c \u043f\u043e\u0441\u0442\u0435<\/a>. \u0422\u0430\u043a \u043a\u0430\u043a \u0437\u0430\u0440\u0430\u043d\u0435\u0435 \u043c\u043d\u0435 \u043d\u0435\u0438\u0437\u0432\u0435\u0441\u0442\u043d\u043e, \u043a\u0430\u043a\u043e\u0439 \u0442\u0435\u043a\u0441\u0442 \u043f\u043e\u0434\u0441\u0443\u043d\u0435\u0442 \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u044c \u2014 \u0431\u0435\u0440\u0443 \u0434\u043b\u0438\u043d\u0443 2 * N, \u0433\u0434\u0435 N \u2014 \u0447\u0438\u0441\u043b\u043e \u0432\u0435\u043a\u0442\u043e\u0440\u043e\u0432 \u0432 \u0434\u043b\u0438\u043d\u043d\u0435\u0439\u0448\u0435\u043c \u0442\u0435\u043a\u0441\u0442\u0435 \u043e\u0431\u0443\u0447\u0430\u044e\u0449\u0435\u0439 \u0432\u044b\u0431\u043e\u0440\u043a\u0438. \u0414\u0430, \u0442\u043a\u043d\u0443\u043b \u043f\u0430\u043b\u044c\u0446\u0435\u0432 \u0432 \u043d\u0435\u0431\u043e.<\/li>\n<li>\u041c\u0430\u0442\u0440\u0438\u0446\u0430 \u043e\u0431\u0440\u0430\u0431\u0430\u0442\u044b\u0432\u0430\u0435\u0442\u0441\u044f \u0441\u0432\u0451\u0440\u0442\u043e\u0447\u043d\u044b\u043c\u0438 \u0443\u0447\u0430\u0441\u0442\u043a\u0430\u043c\u0438 \u0441\u0435\u0442\u0438 (\u043d\u0430 \u0432\u044b\u0445\u043e\u0434\u0435 \u043f\u043e\u043b\u0443\u0447\u0430\u0435\u043c \u043f\u0440\u0435\u043e\u0431\u0440\u0430\u0437\u043e\u0432\u0430\u043d\u043d\u044b\u0435 \u043f\u0440\u0438\u0437\u043d\u0430\u043a\u0438 \u0441\u043b\u043e\u0432\u0430)<\/li>\n<li>\u0412\u044b\u0434\u0435\u043b\u0435\u043d\u043d\u044b\u0435 \u043f\u0440\u0438\u0437\u043d\u0430\u043a\u0438 \u043e\u0431\u0440\u0430\u0431\u0430\u0442\u044b\u0432\u0430\u044e\u0442\u0441\u044f \u043f\u043e\u043b\u043d\u043e\u0441\u0432\u044f\u0437\u043d\u044b\u043c \u0443\u0447\u0430\u0441\u0442\u043a\u043e\u043c \u0441\u0435\u0442\u0438<\/li>\n<\/ul>\n<p>  \u0421\u0442\u043e\u043f \u0441\u043b\u043e\u0432\u0430 \u043e\u0442\u0444\u0438\u043b\u044c\u0442\u0440\u043e\u0432\u044b\u0432\u0430\u044e \u043f\u0440\u0435\u0434\u0432\u0430\u0440\u0438\u0442\u0435\u043b\u044c\u043d\u043e (\u043d\u0430 reuter-\u043c dataset-\u0435 \u044d\u0442\u043e \u043d\u0435 \u0441\u043a\u0430\u0437\u044b\u0432\u0430\u043b\u043e\u0441\u044c, \u043d\u043e \u0432 \u043c\u0435\u043d\u044c\u0448\u0438\u0445 \u043f\u043e \u043e\u0431\u044a\u0435\u043c\u0443 \u043d\u0430\u0431\u043e\u0440\u0430\u0445 \u2014 \u043e\u043a\u0430\u0437\u0430\u043b\u043e \u0432\u043b\u0438\u044f\u043d\u0438\u0435). \u041e\u0431 \u044d\u0442\u043e\u043c \u043d\u0438\u0436\u0435.  <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[],"tags":[],"class_list":["post-280864","post","type-post","status-publish","format-standard","hentry"],"_links":{"self":[{"href":"https:\/\/savepearlharbor.com\/index.php?rest_route=\/wp\/v2\/posts\/280864","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/savepearlharbor.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/savepearlharbor.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/savepearlharbor.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/savepearlharbor.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=280864"}],"version-history":[{"count":0,"href":"https:\/\/savepearlharbor.com\/index.php?rest_route=\/wp\/v2\/posts\/280864\/revisions"}],"wp:attachment":[{"href":"https:\/\/savepearlharbor.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=280864"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/savepearlharbor.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=280864"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/savepearlharbor.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=280864"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}