{"_id":"59e70b9ee027ae002e7d2ee4","category":{"_id":"59e70b9ee027ae002e7d2ed8","version":"59e70b9ee027ae002e7d2ed2","project":"5496d393f52a630b00519cdd","__v":0,"sync":{"url":"","isSync":false},"reference":false,"createdAt":"2014-12-26T11:41:35.509Z","from_sync":false,"order":5,"slug":"batch","title":"Batch"},"parentDoc":null,"project":"5496d393f52a630b00519cdd","user":"5496d353f52a630b00519cdc","version":{"_id":"59e70b9ee027ae002e7d2ed2","project":"5496d393f52a630b00519cdd","__v":2,"createdAt":"2017-10-18T08:06:54.462Z","releaseDate":"2017-10-18T08:06:54.462Z","categories":["59e70b9ee027ae002e7d2ed3","59e70b9ee027ae002e7d2ed4","59e70b9ee027ae002e7d2ed5","59e70b9ee027ae002e7d2ed6","59e70b9ee027ae002e7d2ed7","59e70b9ee027ae002e7d2ed8","59e70b9ee027ae002e7d2ed9","59e70b9ee027ae002e7d2eda","59e70b9ee027ae002e7d2edb","59e70b9ee027ae002e7d2edc","59e70b9ee027ae002e7d2edd","59e70b9ee027ae002e7d2ede","59e70b9ee027ae002e7d2edf","5b8661ccdd19310003a3fa0b"],"is_deprecated":false,"is_hidden":false,"is_beta":true,"is_stable":true,"codename":"","version_clean":"2.0.10","version":"2.0.10"},"githubsync":"","__v":0,"updates":[],"next":{"pages":[],"description":""},"createdAt":"2014-12-26T12:15:35.126Z","link_external":false,"link_url":"","sync_unique":"","hidden":false,"api":{"results":{"codes":[]},"settings":"","auth":"required","params":[],"url":""},"isReference":false,"order":1,"body":"[block:api-header]\n{\n  \"type\": \"basic\",\n  \"title\": \"Export Files Access\"\n}\n[/block]\nPlacer supports export using Google Cloud Storage (GCS) or Amazon Web Services (AWS) S3.\n\n##Using Google Cloud Storage (GCS)\n\n1 - Create a Google Cloud Storage bucket\n\n2 - Create a new Service Account and provide your Placer account manager the credentials file associated with the account (json file)\n\n3 - Provide 'Storage Admin' permission for the bucket, to the newly created Service Account\n\n4 - All done! Once enabled, Placer's data will be uploaded to the desired bucket.\n\n\n\n##Using Amazon Web Services (AWS) S3\n1 - Create an S3 bucket for receiving the export files\n\n2 - Create a user and provide your Placer account manager the following credentials:\n  * AWS_ACCESS_KEY_ID\n  * AWS_SECRET_ACCESS_KEY \n  * The user’s ARN\n\n3 - Add the following User-Policy for this user:\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"{\\n    \\\"Version\\\": \\\"2012-10-17\\\",\\n    \\\"Statement\\\": [\\n        {\\n            \\\"Effect\\\": \\\"Allow\\\",\\n            \\\"Action\\\": [\\n                \\\"s3:PutObject\\\",\\n                \\\"s3:GetObject\\\",\\n                \\\"s3:DeleteObject\\\",\\n                \\\"s3:ListBucket\\\",\\n                \\\"s3:GetBucketLocation\\\"\\n            ],\\n            \\\"Resource\\\": [                \\n                \\\"arn:aws:s3:::{{bucket goes here}}\\\",\\n                \\\"arn:aws:s3:::{{bucket goes here}}/*\\\"\\n            ]\\n        }\\n    ]\\n}\",\n      \"language\": \"json\",\n      \"name\": \"User Policy\"\n    }\n  ]\n}\n[/block]\n[Why we require these bucket permissions?](doc:required-aws-bucket-permissions) \n\n4 - Test bucket policy by downloading and runing the [placer_s3_export.py test script](https://storage.googleapis.com/public_drops/placer_s3_export_test.py) as follows:\n*(replace <bucket-name> <access-key> <access-secret> with the relevant parameters)* \n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"python placer_s3_export_test.py <bucket-name> <access-key> <access-secret>\",\n      \"language\": \"shell\"\n    }\n  ]\n}\n[/block]\nMissing User-Policy permissions will be listed in the script response.\n\nIf there are no issues listed, you are all set! Once enabled, Placer's data will be uploaded to the desired bucket.\n[block:api-header]\n{\n  \"type\": \"basic\",\n  \"title\": \"SOURCE_URL Structure\"\n}\n[/block]\nEach SOURCE_URL received from Placer contains the following format:\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"gs://<YOUR_BUCKET>/results/<APP_KEY>/<DATE>/ \",\n      \"language\": \"shell\",\n      \"name\": \"GCS\"\n    },\n    {\n      \"code\": \"s3://<YOUR_BUCKET>/results/<APP_KEY>/<DATE>/ \",\n      \"language\": \"text\",\n      \"name\": \"AWS S3\"\n    }\n  ]\n}\n[/block]\nThe last folder in the URL path (<DATE>) should contain the query date in a yyyy-mm-dd format (e.g. 2014-08-29). **To get complete and accurate user data, query must be at least 24 hours post the desired date. **\n\n##Example\nQuerying for users data for the 29 of August 2014, should not be done before the 31 of August at 12:01 AM.\nFor example the SOURCE_URL for this query can look like this: \n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"gs://my_cool_company/results/6h3a7rez1rij74kzfb1iws0beyshn33s/2014-08-29\",\n      \"language\": \"shell\",\n      \"name\": \"GCS\"\n    },\n    {\n      \"code\": \"s3://my_cool_company/results/6h3a7rez1rij74kzfb1iws0beyshn33s/2014-08-29\",\n      \"language\": \"text\",\n      \"name\": \"AWS S3\"\n    }\n  ]\n}\n[/block]\n\n[block:api-header]\n{\n  \"type\": \"basic\",\n  \"title\": \"Received Data Structure\"\n}\n[/block]\nFor each date, Placer will group the generated data of your users into 16 different folders. \nGrouping is done based on the first character of the hashing result when implementing SHA-256 of the User ID (therefore the 16 different folders).\n\nIn case you have more than 20 million active devices, grouping will be done using the first 2 characters resulting in 256 different folders.\n\nIn each folder, a compressed gzip file will be generated for each one of the desired API Endpoints (insights, places, etc.). This file will contain a row for each of the users in a JSON format.\n[block:callout]\n{\n  \"type\": \"warning\",\n  \"title\": \"Important\",\n  \"body\": \"1 - The export file format is a [JSON Lines](http://jsonlines.org/) file. This means that the file itself is not a JSON array, but rather a JSON per line. This allows processing the export data as a stream when parsing and importing into a local DB.\\n\\n2 - If the device is not associated with a User ID, grouping of folders will be done based on the SHA-256 of the Device ID. \\n\\n3 - Data are not returned if the device did not transmit any data on the desired date.\\n\\n4 - Export data is stored for 30 days, afterwards it will be deleted.\"\n}\n[/block]\n\n##Example\nIf the user ID is **b579244c-9fc5-4921-96f9-a2e5505b757f**, SHA-256 of it will be **ca574ba5dcce19cf0e8a85c4286941e80c1089f292e8203221915c058d7c7be4** and the profile will be available in the **\"sha256_prefix_c\"** folder.\nIn that folder, the user \"places\" will be available in the \"places.gz\" file and the insights in the \"insights.gz\" file.\n\nA single data row in the insights.gz file may look like this:\n[block:code]\n{\n  \"codes\": [\n    {\n      \"code\": \"{\\n  \\\"app_user_id_sha256\\\": \\\"ca574ba5dcce19cf0e8a85c4286941e80c1089f292e8203221915c058d7c7be4\\\",\\n  \\\"app_user_id\\\": \\\"b579244c-9fc5-4921-96f9-a2e5505b757f\\\",\\n  \\\"device_id\\\": \\\"536340ec6498b3000c9be739\\\",\\n  \\\"insights\\\": {\\n    \\\"interests\\\": [\\n      {\\n        \\\"confidence\\\": \\\"high\\\",\\n        \\\"interest\\\": \\\"Business / Finance\\\"\\n      },\\n      {\\n        \\\"confidence\\\": \\\"high\\\",\\n        \\\"interest\\\": \\\"Deals\\\"\\n      },\\n      {\\n        \\\"confidence\\\": \\\"high\\\",\\n        \\\"interest\\\": \\\"Dining\\\"\\n      },\\n      {\\n        \\\"confidence\\\": \\\"high\\\",\\n        \\\"interest\\\": \\\"Grocery Shopping\\\"\\n      },\\n      {\\n        \\\"confidence\\\": \\\"high\\\",\\n        \\\"interest\\\": \\\"Medical Services\\\"\\n      },\\n      {\\n        \\\"confidence\\\": \\\"high\\\",\\n        \\\"interest\\\": \\\"Pharmaceuticals\\\"\\n      },\\n      {\\n        \\\"confidence\\\": \\\"medium\\\",\\n        \\\"interest\\\": \\\"Shopping & Retail\\\"\\n      },\\n      {\\n        \\\"confidence\\\": \\\"medium\\\",\\n        \\\"interest\\\": \\\"Social Networking\\\"\\n      },\\n      {\\n        \\\"confidence\\\": \\\"medium\\\",\\n        \\\"interest\\\": \\\"Travel\\\"\\n      }\\n    ],\\n    \\\"gender\\\": {\\n      \\\"gender\\\": \\\"Female\\\",\\n      \\\"confidence\\\": \\\"medium\\\"\\n    },\\n    \\\"flights\\\": {\\n      \\\"yearly_count\\\": 1,\\n      \\\"degree\\\": 1\\n    },\\n    \\\"home\\\": [\\n      {\\n        \\\"is_primary\\\": true,\\n        \\\"place\\\": {\\n          \\\"estimated_address\\\": {\\n            \\\"city\\\": \\\"Limerick\\\",\\n            \\\"cc\\\": \\\"IE\\\",\\n            \\\"country\\\": \\\"Ireland\\\",\\n            \\\"source\\\": \\\"Google Maps\\\",\\n            \\\"formatted_city\\\": \\\"Limerick, Ireland\\\",\\n            \\\"formatted_address\\\": \\\"8 Clareview Ave, Limerick, Ireland\\\",\\n            \\\"street_address\\\": \\\"8 Clareview Ave\\\"\\n          },\\n          \\\"estimated_geolocation\\\": {\\n            \\\"lat\\\": 52.6698382,\\n            \\\"long\\\": -8.6343854,\\n            \\\"accuracy\\\": 15.825344109886\\n          },\\n          \\\"type\\\": \\\"Home\\\",\\n          \\\"first_visit_time\\\": \\\"2014-08-24T00:11:11.896000\\\",\\n          \\\"last_visit_time\\\": \\\"2015-01-12T08:45:05.873000\\\",\\n          \\\"venues_info\\\": [\\n            {\\n              \\\"url\\\": null,\\n              \\\"venue_name\\\": null,\\n              \\\"venue_type\\\": null,\\n              \\\"wifi_match\\\": null\\n            }\\n          ]\\n        }\\n      }\\n    ],\\n    \\\"transits\\\": {\\n      \\\"avg_kms_per_day\\\": 8.0663928985596,\\n      \\\"degree\\\": 3\\n    }\\n  }\\n}\",\n      \"language\": \"json\"\n    }\n  ]\n}\n[/block]","excerpt":"*This document describes the steps required to download daily batch files of user data using Google Cloud Storage (GCS) or Amazon Web Services (AWS) S3.\nUsing this process you will be able to download any insights available through the Placer REST API, instead of querying them one by one. Use this process if you need to run periodic batch processing on all user data.*","slug":"data-export","type":"basic","title":"Data Export"}

Data Export

*This document describes the steps required to download daily batch files of user data using Google Cloud Storage (GCS) or Amazon Web Services (AWS) S3. Using this process you will be able to download any insights available through the Placer REST API, instead of querying them one by one. Use this process if you need to run periodic batch processing on all user data.*

[block:api-header] { "type": "basic", "title": "Export Files Access" } [/block] Placer supports export using Google Cloud Storage (GCS) or Amazon Web Services (AWS) S3. ##Using Google Cloud Storage (GCS) 1 - Create a Google Cloud Storage bucket 2 - Create a new Service Account and provide your Placer account manager the credentials file associated with the account (json file) 3 - Provide 'Storage Admin' permission for the bucket, to the newly created Service Account 4 - All done! Once enabled, Placer's data will be uploaded to the desired bucket. ##Using Amazon Web Services (AWS) S3 1 - Create an S3 bucket for receiving the export files 2 - Create a user and provide your Placer account manager the following credentials: * AWS_ACCESS_KEY_ID * AWS_SECRET_ACCESS_KEY * The user’s ARN 3 - Add the following User-Policy for this user: [block:code] { "codes": [ { "code": "{\n \"Version\": \"2012-10-17\",\n \"Statement\": [\n {\n \"Effect\": \"Allow\",\n \"Action\": [\n \"s3:PutObject\",\n \"s3:GetObject\",\n \"s3:DeleteObject\",\n \"s3:ListBucket\",\n \"s3:GetBucketLocation\"\n ],\n \"Resource\": [ \n \"arn:aws:s3:::{{bucket goes here}}\",\n \"arn:aws:s3:::{{bucket goes here}}/*\"\n ]\n }\n ]\n}", "language": "json", "name": "User Policy" } ] } [/block] [Why we require these bucket permissions?](doc:required-aws-bucket-permissions) 4 - Test bucket policy by downloading and runing the [placer_s3_export.py test script](https://storage.googleapis.com/public_drops/placer_s3_export_test.py) as follows: *(replace <bucket-name> <access-key> <access-secret> with the relevant parameters)* [block:code] { "codes": [ { "code": "python placer_s3_export_test.py <bucket-name> <access-key> <access-secret>", "language": "shell" } ] } [/block] Missing User-Policy permissions will be listed in the script response. If there are no issues listed, you are all set! Once enabled, Placer's data will be uploaded to the desired bucket. [block:api-header] { "type": "basic", "title": "SOURCE_URL Structure" } [/block] Each SOURCE_URL received from Placer contains the following format: [block:code] { "codes": [ { "code": "gs://<YOUR_BUCKET>/results/<APP_KEY>/<DATE>/ ", "language": "shell", "name": "GCS" }, { "code": "s3://<YOUR_BUCKET>/results/<APP_KEY>/<DATE>/ ", "language": "text", "name": "AWS S3" } ] } [/block] The last folder in the URL path (<DATE>) should contain the query date in a yyyy-mm-dd format (e.g. 2014-08-29). **To get complete and accurate user data, query must be at least 24 hours post the desired date. ** ##Example Querying for users data for the 29 of August 2014, should not be done before the 31 of August at 12:01 AM. For example the SOURCE_URL for this query can look like this: [block:code] { "codes": [ { "code": "gs://my_cool_company/results/6h3a7rez1rij74kzfb1iws0beyshn33s/2014-08-29", "language": "shell", "name": "GCS" }, { "code": "s3://my_cool_company/results/6h3a7rez1rij74kzfb1iws0beyshn33s/2014-08-29", "language": "text", "name": "AWS S3" } ] } [/block] [block:api-header] { "type": "basic", "title": "Received Data Structure" } [/block] For each date, Placer will group the generated data of your users into 16 different folders. Grouping is done based on the first character of the hashing result when implementing SHA-256 of the User ID (therefore the 16 different folders). In case you have more than 20 million active devices, grouping will be done using the first 2 characters resulting in 256 different folders. In each folder, a compressed gzip file will be generated for each one of the desired API Endpoints (insights, places, etc.). This file will contain a row for each of the users in a JSON format. [block:callout] { "type": "warning", "title": "Important", "body": "1 - The export file format is a [JSON Lines](http://jsonlines.org/) file. This means that the file itself is not a JSON array, but rather a JSON per line. This allows processing the export data as a stream when parsing and importing into a local DB.\n\n2 - If the device is not associated with a User ID, grouping of folders will be done based on the SHA-256 of the Device ID. \n\n3 - Data are not returned if the device did not transmit any data on the desired date.\n\n4 - Export data is stored for 30 days, afterwards it will be deleted." } [/block] ##Example If the user ID is **b579244c-9fc5-4921-96f9-a2e5505b757f**, SHA-256 of it will be **ca574ba5dcce19cf0e8a85c4286941e80c1089f292e8203221915c058d7c7be4** and the profile will be available in the **"sha256_prefix_c"** folder. In that folder, the user "places" will be available in the "places.gz" file and the insights in the "insights.gz" file. A single data row in the insights.gz file may look like this: [block:code] { "codes": [ { "code": "{\n \"app_user_id_sha256\": \"ca574ba5dcce19cf0e8a85c4286941e80c1089f292e8203221915c058d7c7be4\",\n \"app_user_id\": \"b579244c-9fc5-4921-96f9-a2e5505b757f\",\n \"device_id\": \"536340ec6498b3000c9be739\",\n \"insights\": {\n \"interests\": [\n {\n \"confidence\": \"high\",\n \"interest\": \"Business / Finance\"\n },\n {\n \"confidence\": \"high\",\n \"interest\": \"Deals\"\n },\n {\n \"confidence\": \"high\",\n \"interest\": \"Dining\"\n },\n {\n \"confidence\": \"high\",\n \"interest\": \"Grocery Shopping\"\n },\n {\n \"confidence\": \"high\",\n \"interest\": \"Medical Services\"\n },\n {\n \"confidence\": \"high\",\n \"interest\": \"Pharmaceuticals\"\n },\n {\n \"confidence\": \"medium\",\n \"interest\": \"Shopping & Retail\"\n },\n {\n \"confidence\": \"medium\",\n \"interest\": \"Social Networking\"\n },\n {\n \"confidence\": \"medium\",\n \"interest\": \"Travel\"\n }\n ],\n \"gender\": {\n \"gender\": \"Female\",\n \"confidence\": \"medium\"\n },\n \"flights\": {\n \"yearly_count\": 1,\n \"degree\": 1\n },\n \"home\": [\n {\n \"is_primary\": true,\n \"place\": {\n \"estimated_address\": {\n \"city\": \"Limerick\",\n \"cc\": \"IE\",\n \"country\": \"Ireland\",\n \"source\": \"Google Maps\",\n \"formatted_city\": \"Limerick, Ireland\",\n \"formatted_address\": \"8 Clareview Ave, Limerick, Ireland\",\n \"street_address\": \"8 Clareview Ave\"\n },\n \"estimated_geolocation\": {\n \"lat\": 52.6698382,\n \"long\": -8.6343854,\n \"accuracy\": 15.825344109886\n },\n \"type\": \"Home\",\n \"first_visit_time\": \"2014-08-24T00:11:11.896000\",\n \"last_visit_time\": \"2015-01-12T08:45:05.873000\",\n \"venues_info\": [\n {\n \"url\": null,\n \"venue_name\": null,\n \"venue_type\": null,\n \"wifi_match\": null\n }\n ]\n }\n }\n ],\n \"transits\": {\n \"avg_kms_per_day\": 8.0663928985596,\n \"degree\": 3\n }\n }\n}", "language": "json" } ] } [/block]